Assumptions for ANOVA: the right way to check the normality

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110105/19d2ed7b/attachment.pl>
Someone suggested me that I don?t have to check the normality of the 
data, but
the normality of the residuals I get after the fitting of the  linear 
model.
I really ask you to help me to understand this point as I don?t find 
enough
material online where to solve it.
Try the following:
# using your scrd data and your proposed models
fit1<- lm(response ~ stimulus + condition + stimulus:condition, data=scrd)
fit2<- lm(response ~ stimulus + condition, data=scrd)
fit3<- lm(response ~ condition, data=scrd)

# Set up for 6 plots on 1 panel
op = par(mfrow=c(2,3))

# residuals function extracts residuals
# Visual inspection is a good start for checking normality
# You get a much better feel than from some "magic number" statistic
hist(residuals(fit1))
hist(residuals(fit2))
hist(residuals(fit3))

# especially qqnorm() plots which are linear for normal data
qqnorm(residuals(fit1))
qqnorm(residuals(fit2))
qqnorm(residuals(fit3))

# Restore plot parameters
par(op)
If the data are not normally distributed I have to use the kruskal wallys 
test
and not the ANOVA...so please help
me to understand.
Indeed - Kruskal-Wallis is a good test to use for one factor data that is 
ordinal so it is a good alternative to your fit3.
Your "response" seems to be a discrete variable rather than a continuous 
variable.
You must decide if it is reasonable to approximate it with a normal 
distribution which is by definition continuous.
I make a numerical example, could you please tell me if the data in this 
table
are normally distributed or not?

Help!

number                  stimulus condition response
1             flat_550_W_realism         A        3
2             flat_550_W_realism         A        3
3             flat_550_W_realism         A        5
4             flat_550_W_realism         A        3
5             flat_550_W_realism         A        3
6             flat_550_W_realism         A        3
7             flat_550_W_realism         A        3
8             flat_550_W_realism         A        5
9             flat_550_W_realism         A        3
10            flat_550_W_realism         A        3
11            flat_550_W_realism         A        5
12            flat_550_W_realism         A        7
13            flat_550_W_realism         A        5
14            flat_550_W_realism         A        2
15            flat_550_W_realism         A        3
16            flat_550_W_realism        AH        7
17            flat_550_W_realism        AH        4
18            flat_550_W_realism        AH        5
19            flat_550_W_realism        AH        3
20            flat_550_W_realism        AH        6
21            flat_550_W_realism        AH        5
22            flat_550_W_realism        AH        3
23            flat_550_W_realism        AH        5
24            flat_550_W_realism        AH        5
25            flat_550_W_realism        AH        7
26            flat_550_W_realism        AH        2
27            flat_550_W_realism        AH        7
28            flat_550_W_realism        AH        5
29            flat_550_W_realism        AH        5
30         bump_2_step_W_realism         A        1
31         bump_2_step_W_realism         A        3
32         bump_2_step_W_realism         A        5
33         bump_2_step_W_realism         A        1
34         bump_2_step_W_realism         A        3
35         bump_2_step_W_realism         A        2
36         bump_2_step_W_realism         A        5
37         bump_2_step_W_realism         A        4
38         bump_2_step_W_realism         A        4
39         bump_2_step_W_realism         A        4
40         bump_2_step_W_realism         A        4
41         bump_2_step_W_realism        AH        3
42         bump_2_step_W_realism        AH        5
43         bump_2_step_W_realism        AH        1
44         bump_2_step_W_realism        AH        5
45         bump_2_step_W_realism        AH        4
46         bump_2_step_W_realism        AH        4
47         bump_2_step_W_realism        AH        5
48         bump_2_step_W_realism        AH        4
49         bump_2_step_W_realism        AH        3
50         bump_2_step_W_realism        AH        4
51         bump_2_step_W_realism        AH        5
52         bump_2_step_W_realism        AH        4
53         hole_2_step_W_realism         A        3
54         hole_2_step_W_realism         A        3
55         hole_2_step_W_realism         A        4
56         hole_2_step_W_realism         A        1
57         hole_2_step_W_realism         A        4
58         hole_2_step_W_realism         A        3
59         hole_2_step_W_realism         A        5
60         hole_2_step_W_realism         A        4
61         hole_2_step_W_realism         A        3
62         hole_2_step_W_realism         A        4
63         hole_2_step_W_realism         A        7
64         hole_2_step_W_realism         A        5
65         hole_2_step_W_realism         A        1
66         hole_2_step_W_realism         A        4
67         hole_2_step_W_realism        AH        7
68         hole_2_step_W_realism        AH        5
69         hole_2_step_W_realism        AH        5
70         hole_2_step_W_realism        AH        1
71         hole_2_step_W_realism        AH        5
72         hole_2_step_W_realism        AH        5
73         hole_2_step_W_realism        AH        5
74         hole_2_step_W_realism        AH        2
75         hole_2_step_W_realism        AH        6
76         hole_2_step_W_realism        AH        5
77         hole_2_step_W_realism        AH        5
78         hole_2_step_W_realism        AH        6
79     bump_2_heel_toe_W_realism         A        3
80     bump_2_heel_toe_W_realism         A        3
81     bump_2_heel_toe_W_realism         A        3
82     bump_2_heel_toe_W_realism         A        2
83     bump_2_heel_toe_W_realism         A        3
84     bump_2_heel_toe_W_realism         A        3
85     bump_2_heel_toe_W_realism         A        4
86     bump_2_heel_toe_W_realism         A        3
87     bump_2_heel_toe_W_realism         A        4
88     bump_2_heel_toe_W_realism         A        4
89     bump_2_heel_toe_W_realism         A        6
90     bump_2_heel_toe_W_realism         A        5
91     bump_2_heel_toe_W_realism         A        4
92     bump_2_heel_toe_W_realism        AH        7
93     bump_2_heel_toe_W_realism        AH        3
94     bump_2_heel_toe_W_realism        AH        4
95     bump_2_heel_toe_W_realism        AH        2
96     bump_2_heel_toe_W_realism        AH        5
97     bump_2_heel_toe_W_realism        AH        6
98     bump_2_heel_toe_W_realism        AH        4
99     bump_2_heel_toe_W_realism        AH        4
100    bump_2_heel_toe_W_realism        AH        4
101    bump_2_heel_toe_W_realism        AH        5
102    bump_2_heel_toe_W_realism        AH        2
103    bump_2_heel_toe_W_realism        AH        6
104    bump_2_heel_toe_W_realism        AH        5
105    hole_2_heel_toe_W_realism         A        3
106    hole_2_heel_toe_W_realism         A        3
107    hole_2_heel_toe_W_realism         A        1
108    hole_2_heel_toe_W_realism         A        3
109    hole_2_heel_toe_W_realism         A        3
110    hole_2_heel_toe_W_realism         A        5
111    hole_2_heel_toe_W_realism         A        2
112    hole_2_heel_toe_W_realism        AH        5
113    hole_2_heel_toe_W_realism        AH        1
114    hole_2_heel_toe_W_realism        AH        3
115    hole_2_heel_toe_W_realism        AH        6
116    hole_2_heel_toe_W_realism        AH        5
117    hole_2_heel_toe_W_realism        AH        4
118    hole_2_heel_toe_W_realism        AH        4
119    hole_2_heel_toe_W_realism        AH        3
120    hole_2_heel_toe_W_realism        AH        3
121    hole_2_heel_toe_W_realism        AH        1
122    hole_2_heel_toe_W_realism        AH        5
123 bump_2_combination_W_realism         A        4
124 bump_2_combination_W_realism         A        2
125 bump_2_combination_W_realism         A        4
126 bump_2_combination_W_realism         A        1
127 bump_2_combination_W_realism         A        4
128 bump_2_combination_W_realism         A        4
129 bump_2_combination_W_realism         A        2
130 bump_2_combination_W_realism         A        4
131 bump_2_combination_W_realism         A        2
132 bump_2_combination_W_realism         A        4
133 bump_2_combination_W_realism         A        2
134 bump_2_combination_W_realism         A        6
135 bump_2_combination_W_realism        AH        7
136 bump_2_combination_W_realism        AH        3
137 bump_2_combination_W_realism        AH        4
138 bump_2_combination_W_realism        AH        1
139 bump_2_combination_W_realism        AH        6
140 bump_2_combination_W_realism        AH        5
141 bump_2_combination_W_realism        AH        5
142 bump_2_combination_W_realism        AH        6
143 bump_2_combination_W_realism        AH        5
144 bump_2_combination_W_realism        AH        4
145 bump_2_combination_W_realism        AH        2
146 bump_2_combination_W_realism        AH        4
147 bump_2_combination_W_realism        AH        2
148 bump_2_combination_W_realism        AH        5
149 hole_2_combination_W_realism         A        5
150 hole_2_combination_W_realism         A        2
151 hole_2_combination_W_realism         A        4
152 hole_2_combination_W_realism         A        1
153 hole_2_combination_W_realism         A        5
154 hole_2_combination_W_realism         A        4
155 hole_2_combination_W_realism         A        3
156 hole_2_combination_W_realism         A        5
157 hole_2_combination_W_realism         A        2
158 hole_2_combination_W_realism         A        5
159 hole_2_combination_W_realism         A        5
160 hole_2_combination_W_realism         A        1
161 hole_2_combination_W_realism        AH        7
162 hole_2_combination_W_realism        AH        5
163 hole_2_combination_W_realism        AH        3
164 hole_2_combination_W_realism        AH        1
165 hole_2_combination_W_realism        AH        6
166 hole_2_combination_W_realism        AH        4
167 hole_2_combination_W_realism        AH        7
168 hole_2_combination_W_realism        AH        5
169 hole_2_combination_W_realism        AH        5
170 hole_2_combination_W_realism        AH        2
171 hole_2_combination_W_realism        AH        6
172 hole_2_combination_W_realism        AH        2
173 hole_2_combination_W_realism        AH        4

Thanks in advance

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110105/e5a35a4c/attachment.pl>
Remember that an non-significant result (especially one that is still near alpha like yours) does not give evidence that the null is true.  The reason that the 1st 2 tests below don't show significance is more due to lack of power than some of the residuals being normal.  The only test that I would trust for this is SnowsPenultimateNormalityTest (TeachingDemos package, the help page is more useful than the function itself).

But I think that you are mixing up 2 different concepts (a very common misunderstanding).  What is important if we want to do normal theory inference is that the coefficients/effects/estimates are normally distributed.  Now since these coefficients can be shown to be linear combinations of the error terms, if the errors are iid normal then the coefficients are also normally distributed.  So many people want to show that the residuals come from a perfectly normal distribution.  But it is the theoretical errors, not the observed residuals that are important (the observed residuals are not iid).  You need to think about the source of your data to see if this is a reasonable assumption.  Now I cannot fathom any universe (theoretical or real) in which normally distributed errors added to means that they are independent of will result in a finite set of integers, so an assumption of exact normality is not reasonable (some may want to argue this, but convincing me will be very difficult).  But looking for exact normality is a bit of a red herring because, we also have the Central Limit Theorem that says that if the errors are not normal (but still iid) then the distribution of the coefficients will approach normality as the sample size increases.  This is what make statistics doable (because no real dataset entered into the computer is exactly normal).  The more important question is are the residuals "normal enough"?  for which there is not a definitive test (experience and plots help).

But this all depends on another assumption that I don't think that you have even considered.  Yes we can use normal theory even when the random part of the data is not normally distributed, but this still assumes that the data is at least interval data, i.e. that we firmly believe that the difference between a response of 1 and a response of 2 is exactly the same as a difference between a 6 and a 7 and that the difference from 4 to 6 is exactly twice that of 1 vs. 2.  From your data and other descriptions, I don't think that that is a reasonable assumption.  If you are not willing to make that assumption (like me) then means and normal theory tests are meaningless and you should use other approaches.  One possibility is to use non-parametric methods (which I believe Frank has already suggested you use), another is to use proportional odds logistic regression.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Frodo Jedi
Sent: Wednesday, January 05, 2011 3:22 PM
To: Robert Baer; r-help at r-project.org
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

Dear Robert,
thanks so much!!!  Now I understand!
So you also think that I have to check only the residuals and not the
data
directly.
Now just for curiosity I did the the shapiro test on the residuals. The
problem
is that on fit3 I don?t get from the test
that the data are normally distribuited. Why? Here the data:

shapiro.test(residuals(fit1))
    Shapiro-Wilk normality test

data:  residuals(fit1)
W = 0.9848, p-value = 0.05693

#Here the test is ok: the test says that the data are distributed
normally
(p-value greather than 0.05)

shapiro.test(residuals(fit2))
    Shapiro-Wilk normality test

data:  residuals(fit2)
W = 0.9853, p-value = 0.06525

#Here the test is ok: the test says that the data are distributed
normally
(p-value greather than 0.05)

shapiro.test(residuals(fit3))
    Shapiro-Wilk normality test

data:  residuals(fit3)
W = 0.9621, p-value = 0.0001206

Now the test reveals p-value lower than 0.05: so the residuals for fit3
are not
distributed normally....
Why I get this beheaviour? Indeed in the histogram and Q-Q plot for
fit3
residuals I get a normal distribution.

________________________________
From: Robert Baer <rbaer at atsu.edu>

Sent: Wed, January 5, 2011 8:56:50 PM
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

Someone suggested me that I don?t have to check the normality of the
data, but
the normality of the residuals I get after the fitting of the  linear
model.
I really ask you to help me to understand this point as I don?t find
enough
material online where to solve it.

Try the following:
# using your scrd data and your proposed models
fit1<- lm(response ~ stimulus + condition + stimulus:condition,
data=scrd)
fit2<- lm(response ~ stimulus + condition, data=scrd)
fit3<- lm(response ~ condition, data=scrd)

# Set up for 6 plots on 1 panel
op = par(mfrow=c(2,3))

# residuals function extracts residuals
# Visual inspection is a good start for checking normality
# You get a much better feel than from some "magic number" statistic
hist(residuals(fit1))
hist(residuals(fit2))
hist(residuals(fit3))

# especially qqnorm() plots which are linear for normal data
qqnorm(residuals(fit1))
qqnorm(residuals(fit2))
qqnorm(residuals(fit3))

# Restore plot parameters
par(op)

If the data are not normally distributed I have to use the kruskal
wallys test
and not the ANOVA...so please help
me to understand.

Indeed - Kruskal-Wallis is a good test to use for one factor data that
is
ordinal so it is a good alternative to your fit3.
Your "response" seems to be a discrete variable rather than a
continuous
variable.
You must decide if it is reasonable to approximate it with a normal
distribution
which is by definition continuous.

I make a numerical example, could you please tell me if the data in
this table
are normally distributed or not?

Help!

number                  stimulus condition response
1             flat_550_W_realism         A        3
2             flat_550_W_realism         A        3
3             flat_550_W_realism         A        5
4             flat_550_W_realism         A        3
5             flat_550_W_realism         A        3
6             flat_550_W_realism         A        3
7             flat_550_W_realism         A        3
8             flat_550_W_realism         A        5
9             flat_550_W_realism         A        3
10            flat_550_W_realism         A        3
11            flat_550_W_realism         A        5
12            flat_550_W_realism         A        7
13            flat_550_W_realism         A        5
14            flat_550_W_realism         A        2
15            flat_550_W_realism         A        3
16            flat_550_W_realism        AH        7
17            flat_550_W_realism        AH        4
18            flat_550_W_realism        AH        5
19            flat_550_W_realism        AH        3
20            flat_550_W_realism        AH        6
21            flat_550_W_realism        AH        5
22            flat_550_W_realism        AH        3
23            flat_550_W_realism        AH        5
24            flat_550_W_realism        AH        5
25            flat_550_W_realism        AH        7
26            flat_550_W_realism        AH        2
27            flat_550_W_realism        AH        7
28            flat_550_W_realism        AH        5
29            flat_550_W_realism        AH        5
30         bump_2_step_W_realism         A        1
31         bump_2_step_W_realism         A        3
32         bump_2_step_W_realism         A        5
33         bump_2_step_W_realism         A        1
34         bump_2_step_W_realism         A        3
35         bump_2_step_W_realism         A        2
36         bump_2_step_W_realism         A        5
37         bump_2_step_W_realism         A        4
38         bump_2_step_W_realism         A        4
39         bump_2_step_W_realism         A        4
40         bump_2_step_W_realism         A        4
41         bump_2_step_W_realism        AH        3
42         bump_2_step_W_realism        AH        5
43         bump_2_step_W_realism        AH        1
44         bump_2_step_W_realism        AH        5
45         bump_2_step_W_realism        AH        4
46         bump_2_step_W_realism        AH        4
47         bump_2_step_W_realism        AH        5
48         bump_2_step_W_realism        AH        4
49         bump_2_step_W_realism        AH        3
50         bump_2_step_W_realism        AH        4
51         bump_2_step_W_realism        AH        5
52         bump_2_step_W_realism        AH        4
53         hole_2_step_W_realism         A        3
54         hole_2_step_W_realism         A        3
55         hole_2_step_W_realism         A        4
56         hole_2_step_W_realism         A        1
57         hole_2_step_W_realism         A        4
58         hole_2_step_W_realism         A        3
59         hole_2_step_W_realism         A        5
60         hole_2_step_W_realism         A        4
61         hole_2_step_W_realism         A        3
62         hole_2_step_W_realism         A        4
63         hole_2_step_W_realism         A        7
64         hole_2_step_W_realism         A        5
65         hole_2_step_W_realism         A        1
66         hole_2_step_W_realism         A        4
67         hole_2_step_W_realism        AH        7
68         hole_2_step_W_realism        AH        5
69         hole_2_step_W_realism        AH        5
70         hole_2_step_W_realism        AH        1
71         hole_2_step_W_realism        AH        5
72         hole_2_step_W_realism        AH        5
73         hole_2_step_W_realism        AH        5
74         hole_2_step_W_realism        AH        2
75         hole_2_step_W_realism        AH        6
76         hole_2_step_W_realism        AH        5
77         hole_2_step_W_realism        AH        5
78         hole_2_step_W_realism        AH        6
79     bump_2_heel_toe_W_realism         A        3
80     bump_2_heel_toe_W_realism         A        3
81     bump_2_heel_toe_W_realism         A        3
82     bump_2_heel_toe_W_realism         A        2
83     bump_2_heel_toe_W_realism         A        3
84     bump_2_heel_toe_W_realism         A        3
85     bump_2_heel_toe_W_realism         A        4
86     bump_2_heel_toe_W_realism         A        3
87     bump_2_heel_toe_W_realism         A        4
88     bump_2_heel_toe_W_realism         A        4
89     bump_2_heel_toe_W_realism         A        6
90     bump_2_heel_toe_W_realism         A        5
91     bump_2_heel_toe_W_realism         A        4
92     bump_2_heel_toe_W_realism        AH        7
93     bump_2_heel_toe_W_realism        AH        3
94     bump_2_heel_toe_W_realism        AH        4
95     bump_2_heel_toe_W_realism        AH        2
96     bump_2_heel_toe_W_realism        AH        5
97     bump_2_heel_toe_W_realism        AH        6
98     bump_2_heel_toe_W_realism        AH        4
99     bump_2_heel_toe_W_realism        AH        4
100    bump_2_heel_toe_W_realism        AH        4
101    bump_2_heel_toe_W_realism        AH        5
102    bump_2_heel_toe_W_realism        AH        2
103    bump_2_heel_toe_W_realism        AH        6
104    bump_2_heel_toe_W_realism        AH        5
105    hole_2_heel_toe_W_realism         A        3
106    hole_2_heel_toe_W_realism         A        3
107    hole_2_heel_toe_W_realism         A        1
108    hole_2_heel_toe_W_realism         A        3
109    hole_2_heel_toe_W_realism         A        3
110    hole_2_heel_toe_W_realism         A        5
111    hole_2_heel_toe_W_realism         A        2
112    hole_2_heel_toe_W_realism        AH        5
113    hole_2_heel_toe_W_realism        AH        1
114    hole_2_heel_toe_W_realism        AH        3
115    hole_2_heel_toe_W_realism        AH        6
116    hole_2_heel_toe_W_realism        AH        5
117    hole_2_heel_toe_W_realism        AH        4
118    hole_2_heel_toe_W_realism        AH        4
119    hole_2_heel_toe_W_realism        AH        3
120    hole_2_heel_toe_W_realism        AH        3
121    hole_2_heel_toe_W_realism        AH        1
122    hole_2_heel_toe_W_realism        AH        5
123 bump_2_combination_W_realism         A        4
124 bump_2_combination_W_realism         A        2
125 bump_2_combination_W_realism         A        4
126 bump_2_combination_W_realism         A        1
127 bump_2_combination_W_realism         A        4
128 bump_2_combination_W_realism         A        4
129 bump_2_combination_W_realism         A        2
130 bump_2_combination_W_realism         A        4
131 bump_2_combination_W_realism         A        2
132 bump_2_combination_W_realism         A        4
133 bump_2_combination_W_realism         A        2
134 bump_2_combination_W_realism         A        6
135 bump_2_combination_W_realism        AH        7
136 bump_2_combination_W_realism        AH        3
137 bump_2_combination_W_realism        AH        4
138 bump_2_combination_W_realism        AH        1
139 bump_2_combination_W_realism        AH        6
140 bump_2_combination_W_realism        AH        5
141 bump_2_combination_W_realism        AH        5
142 bump_2_combination_W_realism        AH        6
143 bump_2_combination_W_realism        AH        5
144 bump_2_combination_W_realism        AH        4
145 bump_2_combination_W_realism        AH        2
146 bump_2_combination_W_realism        AH        4
147 bump_2_combination_W_realism        AH        2
148 bump_2_combination_W_realism        AH        5
149 hole_2_combination_W_realism         A        5
150 hole_2_combination_W_realism         A        2
151 hole_2_combination_W_realism         A        4
152 hole_2_combination_W_realism         A        1
153 hole_2_combination_W_realism         A        5
154 hole_2_combination_W_realism         A        4
155 hole_2_combination_W_realism         A        3
156 hole_2_combination_W_realism         A        5
157 hole_2_combination_W_realism         A        2
158 hole_2_combination_W_realism         A        5
159 hole_2_combination_W_realism         A        5
160 hole_2_combination_W_realism         A        1
161 hole_2_combination_W_realism        AH        7
162 hole_2_combination_W_realism        AH        5
163 hole_2_combination_W_realism        AH        3
164 hole_2_combination_W_realism        AH        1
165 hole_2_combination_W_realism        AH        6
166 hole_2_combination_W_realism        AH        4
167 hole_2_combination_W_realism        AH        7
168 hole_2_combination_W_realism        AH        5
169 hole_2_combination_W_realism        AH        5
170 hole_2_combination_W_realism        AH        2
171 hole_2_combination_W_realism        AH        6
172 hole_2_combination_W_realism        AH        2
173 hole_2_combination_W_realism        AH        4

Thanks in advance

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

      [[alternative HTML version deleted]]
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110106/2b0f67f8/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110106/b2391f0c/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110106/541a76fe/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110106/b6853206/attachment.pl>
A lot of this depends on what question you are really trying to answer.  For one way anova replacing y-values with their ranks essentially transforms the distribution to uniform (under the null) and the Central Limit Theorem kicks in for the uniform with samples larger than about 5, so the normal approximations are pretty good and the theory works, but what are you actually testing?  The most meaningful null that is being tested is that all data come from the exact same distribution.  So what does it mean when you reject that null?  It means that all the groups are not representing the same distribution, but is that because the means differ? Or the variances? Or the shapes? It can be any of those.  Some point out that if you make certain assumptions such as symmetry or shifts of the same distributions, then you can talk about differences in means or medians, but usually if I am using non-parametrics it is because I don't believe that things are symmetric and the shift idea doesn't fit in my mind.

Some alternatives include bootstrapping or permutation tests, or just transforming the data to get something closer to normal.

Now what does replacing by ranks do in 2-way anova where we want to test the difference in one factor without making assumptions about whether the other factor has an effect or not?  I'm not sure on this one.

I have seen regression on ranks, it basically tests for some level of relationship, but regression is usually used for some type of prediction and predicting from a rank-rank regression does not seem meaningful to me.

Fitting the regression model does not require normality, it is the tests on the coefficients and confidence and prediction intervals that assume normality (again the CLT helps for large samples (but not for prediction intervals)).  Bootstrapping is an option for regression without assuming normality, transformations can also help.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Ben Ward
Sent: Thursday, January 06, 2011 2:00 PM
To: r-help at r-project.org
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

On 06/01/2011 20:29, Greg Snow wrote:
Some would argue to always use the kruskal wallis test since we never
know for sure if we have normality.  Personally I am not sure that I
understand what exactly that test is really testing.  Plus in your case
you are doing a two-way anova and kruskal.test does one-way, so it will
not work for your case.  There are other non-parametric options.
Just read this and had queries of my own and comments on this subject:
Would one of these options be to rank the data before doing whatever
model or test you want to do? As I understand it makes the place of the
data the same, but pulls extreme cases closer to the rest. Not an
expert
though.
I've been doing lm() for my work, and I don't know if that makes an
assumption of normality (may data is not normal). And I'm unsure of any
other assumptions as my texts don't really discuss them. Although I can
comfortably evaluate a model say using residual vs fitted, and F values
turned to P, resampling and confidence intervals, and looking at sums
of
squares terms add to explanation of the model. I've tried the plot()
function to help graphically evaluate a model, and I want to make sure
I
understand what it's showing me. I think the first, is showing me the
models fitted values vs the residuals, and ideally, I think the closer
the points are to the red line the better. The next plot is a Q-Q plot,
the closer the points to the line, the more normal the model
coefficients (or perhaps the data). I'm not sure what the next two
plots
are, but it is titled Scale-Location. And it looks to have the square
root of standardized residuals on y, and fitted model values on x.
Might
this be similar to the first plot? The final one is titled Residuals vs
Leverage, which has standardized residuals on y and leverage on x, and
something called Cooks Distance is plotted as well.

Thanks,
Ben. W
Whether to use anova and other normality based tests is really a
matter of what assumptions you are willing to live with and what level
of "close enough" you are comfortable with.  Consulting with a local
consultant with experience in these areas is useful if you don't have
enough experience to decide what you are comfortable with.
For your description, I would try the proportional odds logistic
regression, but again, you should probably consult with someone who has
experience rather than trying that on your own until you have more
training and experience.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

From: Frodo Jedi [mailto:frodo.jedi at yahoo.com]
Sent: Thursday, January 06, 2011 12:57 PM
To: Greg Snow; r-help at r-project.org
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

Ok,
I see ;-)

Let?s put in this way then. When do I have to use the kruskal wallis
test? I mean, when I am very sure that I have
to use it instead of ANOVA?

Thanks

Best regards

P.S.  In addition, which is the non parametric methods corresponding
to a 2 ways anova?..or have I to
repeat many times the kruskal wallis test?
________________________________
From: Greg Snow<Greg.Snow at imail.org>
To: Frodo Jedi<frodo.jedi at yahoo.com>; Robert Baer<rbaer at atsu.edu>;
"r-help at r-project.org"<r-help at r-project.org>
Sent: Thu, January 6, 2011 7:07:17 PM
Subject: RE: [R] Assumptions for ANOVA: the right way to check the
normality
Remember that an non-significant result (especially one that is still
near alpha like yours) does not give evidence that the null is true.
The reason that the 1st 2 tests below don't show significance is more
due to lack of power than some of the residuals being normal.  The only
test that I would trust for this is SnowsPenultimateNormalityTest
(TeachingDemos package, the help page is more useful than the function
itself).
But I think that you are mixing up 2 different concepts (a very
common misunderstanding).  What is important if we want to do normal
theory inference is that the coefficients/effects/estimates are
normally distributed.  Now since these coefficients can be shown to be
linear combinations of the error terms, if the errors are iid normal
then the coefficients are also normally distributed.  So many people
want to show that the residuals come from a perfectly normal
distribution.  But it is the theoretical errors, not the observed
residuals that are important (the observed residuals are not iid).  You
need to think about the source of your data to see if this is a
reasonable assumption.  Now I cannot fathom any universe (theoretical
or real) in which normally distributed errors added to means that they
are independent of will result in a finite set of integers, so an
assumption of exact normality is not reasonable (some may want to argue
this, but convincing me will be very difficult).  But looking for exact
normality is a bit of a red herring because, we also have the Central
Limit Theorem that says that if the errors are not normal (but still
iid) then the distribution of the coefficients will approach normality
as the sample size increases.  This is what make statistics doable
(because no real dataset entered into the computer is exactly normal).
The more important question is are the residuals "normal enough"?  for
which there is not a definitive test (experience and plots help).
But this all depends on another assumption that I don't think that
you have even considered.  Yes we can use normal theory even when the
random part of the data is not normally distributed, but this still
assumes that the data is at least interval data, i.e. that we firmly
believe that the difference between a response of 1 and a response of 2
is exactly the same as a difference between a 6 and a 7 and that the
difference from 4 to 6 is exactly twice that of 1 vs. 2.  From your
data and other descriptions, I don't think that that is a reasonable
assumption.  If you are not willing to make that assumption (like me)
then means and normal theory tests are meaningless and you should use
other approaches.  One possibility is to use non-parametric methods
(which I believe Frank has already suggested you use), another is to
use proportional odds logistic regression.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org<mailto:greg.snow at imail.org>
801.408.8111

-----Original Message-----
From: r-help-bounces at r-project.org<mailto:r-help-bounces at r-
project.org>  [mailto:r-help-bounces at r-
project.org<http://project.org>] On Behalf Of Frodo Jedi
Sent: Wednesday, January 05, 2011 3:22 PM
To: Robert Baer; r-help at r-project.org<mailto:r-help at r-project.org>
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

Dear Robert,
thanks so much!!!  Now I understand!
So you also think that I have to check only the residuals and not
the
data
directly.
Now just for curiosity I did the the shapiro test on the residuals.
The
problem
is that on fit3 I don?t get from the test
that the data are normally distribuited. Why? Here the data:

shapiro.test(residuals(fit1))
    Shapiro-Wilk normality test

data:  residuals(fit1)
W = 0.9848, p-value = 0.05693

#Here the test is ok: the test says that the data are distributed
normally
(p-value greather than 0.05)

shapiro.test(residuals(fit2))
    Shapiro-Wilk normality test

data:  residuals(fit2)
W = 0.9853, p-value = 0.06525

#Here the test is ok: the test says that the data are distributed
normally
(p-value greather than 0.05)

shapiro.test(residuals(fit3))
    Shapiro-Wilk normality test

data:  residuals(fit3)
W = 0.9621, p-value = 0.0001206

Now the test reveals p-value lower than 0.05: so the residuals for
fit3
are not
distributed normally....
Why I get this beheaviour? Indeed in the histogram and Q-Q plot for
fit3
residuals I get a normal distribution.

________________________________
From: Robert Baer<rbaer at atsu.edu<mailto:rbaer at atsu.edu>>

Sent: Wed, January 5, 2011 8:56:50 PM
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

Someone suggested me that I don?t have to check the normality of
the
data, but
the normality of the residuals I get after the fitting of the
linear
model.
I really ask you to help me to understand this point as I don?t
find
enough
material online where to solve it.
Try the following:
# using your scrd data and your proposed models
fit1<- lm(response ~ stimulus + condition + stimulus:condition,
data=scrd)
fit2<- lm(response ~ stimulus + condition, data=scrd)
fit3<- lm(response ~ condition, data=scrd)

# Set up for 6 plots on 1 panel
op = par(mfrow=c(2,3))

# residuals function extracts residuals
# Visual inspection is a good start for checking normality
# You get a much better feel than from some "magic number" statistic
hist(residuals(fit1))
hist(residuals(fit2))
hist(residuals(fit3))

# especially qqnorm() plots which are linear for normal data
qqnorm(residuals(fit1))
qqnorm(residuals(fit2))
qqnorm(residuals(fit3))

# Restore plot parameters
par(op)

If the data are not normally distributed I have to use the kruskal
wallys test
and not the ANOVA...so please help
me to understand.
Indeed - Kruskal-Wallis is a good test to use for one factor data
that
is
ordinal so it is a good alternative to your fit3.
Your "response" seems to be a discrete variable rather than a
continuous
variable.
You must decide if it is reasonable to approximate it with a normal
distribution
which is by definition continuous.

I make a numerical example, could you please tell me if the data in
this table
are normally distributed or not?

Help!

number                  stimulus condition response
1            flat_550_W_realism        A        3
2            flat_550_W_realism        A        3
3            flat_550_W_realism        A        5
4            flat_550_W_realism        A        3
5            flat_550_W_realism        A        3
6            flat_550_W_realism        A        3
7            flat_550_W_realism        A        3
8            flat_550_W_realism        A        5
9            flat_550_W_realism        A        3
10            flat_550_W_realism        A        3
11            flat_550_W_realism        A        5
12            flat_550_W_realism        A        7
13            flat_550_W_realism        A        5
14            flat_550_W_realism        A        2
15            flat_550_W_realism        A        3
16            flat_550_W_realism        AH        7
17            flat_550_W_realism        AH        4
18            flat_550_W_realism        AH        5
19            flat_550_W_realism        AH        3
20            flat_550_W_realism        AH        6
21            flat_550_W_realism        AH        5
22            flat_550_W_realism        AH        3
23            flat_550_W_realism        AH        5
24            flat_550_W_realism        AH        5
25            flat_550_W_realism        AH        7
26            flat_550_W_realism        AH        2
27            flat_550_W_realism        AH        7
28            flat_550_W_realism        AH        5
29            flat_550_W_realism        AH        5
30        bump_2_step_W_realism        A        1
31        bump_2_step_W_realism        A        3
32        bump_2_step_W_realism        A        5
33        bump_2_step_W_realism        A        1
34        bump_2_step_W_realism        A        3
35        bump_2_step_W_realism        A        2
36        bump_2_step_W_realism        A        5
37        bump_2_step_W_realism        A        4
38        bump_2_step_W_realism        A        4
39        bump_2_step_W_realism        A        4
40        bump_2_step_W_realism        A        4
41        bump_2_step_W_realism        AH        3
42        bump_2_step_W_realism        AH        5
43        bump_2_step_W_realism        AH        1
44        bump_2_step_W_realism        AH        5
45        bump_2_step_W_realism        AH        4
46        bump_2_step_W_realism        AH        4
47        bump_2_step_W_realism        AH        5
48        bump_2_step_W_realism        AH        4
49        bump_2_step_W_realism        AH        3
50        bump_2_step_W_realism        AH        4
51        bump_2_step_W_realism        AH        5
52        bump_2_step_W_realism        AH        4
53        hole_2_step_W_realism        A        3
54        hole_2_step_W_realism        A        3
55        hole_2_step_W_realism        A        4
56        hole_2_step_W_realism        A        1
57        hole_2_step_W_realism        A        4
58        hole_2_step_W_realism        A        3
59        hole_2_step_W_realism        A        5
60        hole_2_step_W_realism        A        4
61        hole_2_step_W_realism        A        3
62        hole_2_step_W_realism        A        4
63        hole_2_step_W_realism        A        7
64        hole_2_step_W_realism        A        5
65        hole_2_step_W_realism        A        1
66        hole_2_step_W_realism        A        4
67        hole_2_step_W_realism        AH        7
68        hole_2_step_W_realism        AH        5
69        hole_2_step_W_realism        AH        5
70        hole_2_step_W_realism        AH        1
71        hole_2_step_W_realism        AH        5
72        hole_2_step_W_realism        AH        5
73        hole_2_step_W_realism        AH        5
74        hole_2_step_W_realism        AH        2
75        hole_2_step_W_realism        AH        6
76        hole_2_step_W_realism        AH        5
77        hole_2_step_W_realism        AH        5
78        hole_2_step_W_realism        AH        6
79    bump_2_heel_toe_W_realism        A        3
80    bump_2_heel_toe_W_realism        A        3
81    bump_2_heel_toe_W_realism        A        3
82    bump_2_heel_toe_W_realism        A        2
83    bump_2_heel_toe_W_realism        A        3
84    bump_2_heel_toe_W_realism        A        3
85    bump_2_heel_toe_W_realism        A        4
86    bump_2_heel_toe_W_realism        A        3
87    bump_2_heel_toe_W_realism        A        4
88    bump_2_heel_toe_W_realism        A        4
89    bump_2_heel_toe_W_realism        A        6
90    bump_2_heel_toe_W_realism        A        5
91    bump_2_heel_toe_W_realism        A        4
92    bump_2_heel_toe_W_realism        AH        7
93    bump_2_heel_toe_W_realism        AH        3
94    bump_2_heel_toe_W_realism        AH        4
95    bump_2_heel_toe_W_realism        AH        2
96    bump_2_heel_toe_W_realism        AH        5
97    bump_2_heel_toe_W_realism        AH        6
98    bump_2_heel_toe_W_realism        AH        4
99    bump_2_heel_toe_W_realism        AH        4
100    bump_2_heel_toe_W_realism        AH        4
101    bump_2_heel_toe_W_realism        AH        5
102    bump_2_heel_toe_W_realism        AH        2
103    bump_2_heel_toe_W_realism        AH        6
104    bump_2_heel_toe_W_realism        AH        5
105    hole_2_heel_toe_W_realism        A        3
106    hole_2_heel_toe_W_realism        A        3
107    hole_2_heel_toe_W_realism        A        1
108    hole_2_heel_toe_W_realism        A        3
109    hole_2_heel_toe_W_realism        A        3
110    hole_2_heel_toe_W_realism        A        5
111    hole_2_heel_toe_W_realism        A        2
112    hole_2_heel_toe_W_realism        AH        5
113    hole_2_heel_toe_W_realism        AH        1
114    hole_2_heel_toe_W_realism        AH        3
115    hole_2_heel_toe_W_realism        AH        6
116    hole_2_heel_toe_W_realism        AH        5
117    hole_2_heel_toe_W_realism        AH        4
118    hole_2_heel_toe_W_realism        AH        4
119    hole_2_heel_toe_W_realism        AH        3
120    hole_2_heel_toe_W_realism        AH        3
121    hole_2_heel_toe_W_realism        AH        1
122    hole_2_heel_toe_W_realism        AH        5
123 bump_2_combination_W_realism        A        4
124 bump_2_combination_W_realism        A        2
125 bump_2_combination_W_realism        A        4
126 bump_2_combination_W_realism        A        1
127 bump_2_combination_W_realism        A        4
128 bump_2_combination_W_realism        A        4
129 bump_2_combination_W_realism        A        2
130 bump_2_combination_W_realism        A        4
131 bump_2_combination_W_realism        A        2
132 bump_2_combination_W_realism        A        4
133 bump_2_combination_W_realism        A        2
134 bump_2_combination_W_realism        A        6
135 bump_2_combination_W_realism        AH        7
136 bump_2_combination_W_realism        AH        3
137 bump_2_combination_W_realism        AH        4
138 bump_2_combination_W_realism        AH        1
139 bump_2_combination_W_realism        AH        6
140 bump_2_combination_W_realism        AH        5
141 bump_2_combination_W_realism        AH        5
142 bump_2_combination_W_realism        AH        6
143 bump_2_combination_W_realism        AH        5
144 bump_2_combination_W_realism        AH        4
145 bump_2_combination_W_realism        AH        2
146 bump_2_combination_W_realism        AH        4
147 bump_2_combination_W_realism        AH        2
148 bump_2_combination_W_realism        AH        5
149 hole_2_combination_W_realism        A        5
150 hole_2_combination_W_realism        A        2
151 hole_2_combination_W_realism        A        4
152 hole_2_combination_W_realism        A        1
153 hole_2_combination_W_realism        A        5
154 hole_2_combination_W_realism        A        4
155 hole_2_combination_W_realism        A        3
156 hole_2_combination_W_realism        A        5
157 hole_2_combination_W_realism        A        2
158 hole_2_combination_W_realism        A        5
159 hole_2_combination_W_realism        A        5
160 hole_2_combination_W_realism        A        1
161 hole_2_combination_W_realism        AH        7
162 hole_2_combination_W_realism        AH        5
163 hole_2_combination_W_realism        AH        3
164 hole_2_combination_W_realism        AH        1
165 hole_2_combination_W_realism        AH        6
166 hole_2_combination_W_realism        AH        4
167 hole_2_combination_W_realism        AH        7
168 hole_2_combination_W_realism        AH        5
169 hole_2_combination_W_realism        AH        5
170 hole_2_combination_W_realism        AH        2
171 hole_2_combination_W_realism        AH        6
172 hole_2_combination_W_realism        AH        2
173 hole_2_combination_W_realism        AH        4

Thanks in advance

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org>  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

      [[alternative HTML version deleted]]
    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

      [[alternative HTML version deleted]]
I believe what I'm doing, is an ancova, because I have two categorical 
and a numerical explanatory variables, and a numerical response variable 
(this is the same experiment as before, the bacteria), and I'm just, at 
the minute (because I'm only half way through), doing some modelling and 
seeing what I get with what I currently have. And I'm paying attention 
to 95% CI for the different terms of a model, as well as the 
coefficient, and the explanatory power of the term and likelyhood that 
the same result could be obtained at random through the P values, 
derived from F. To be honest I havent checked much what my data 
distributions are like and such becasue I'm not finished collecting it 
yet. I mainly mentioned the ranking because it was given considerable 
mention in one of my texts sections on hypothesis testing on models.
A lot of this depends on what question you are really trying to answer.  For one way anova replacing y-values with their ranks essentially transforms the distribution to uniform (under the null) and the Central Limit Theorem kicks in for the uniform with samples larger than about 5, so the normal approximations are pretty good and the theory works, but what are you actually testing?  The most meaningful null that is being tested is that all data come from the exact same distribution.  So what does it mean when you reject that null?  It means that all the groups are not representing the same distribution, but is that because the means differ? Or the variances? Or the shapes? It can be any of those.  Some point out that if you make certain assumptions such as symmetry or shifts of the same distributions, then you can talk about differences in means or medians, but usually if I am using non-parametrics it is because I don't believe that things are symmetric and the shift idea doesn't fit in my mind.

Some alternatives include bootstrapping or permutation tests, or just transforming the data to get something closer to normal.

Now what does replacing by ranks do in 2-way anova where we want to test the difference in one factor without making assumptions about whether the other factor has an effect or not?  I'm not sure on this one.

I have seen regression on ranks, it basically tests for some level of relationship, but regression is usually used for some type of prediction and predicting from a rank-rank regression does not seem meaningful to me.

Fitting the regression model does not require normality, it is the tests on the coefficients and confidence and prediction intervals that assume normality (again the CLT helps for large samples (but not for prediction intervals)).  Bootstrapping is an option for regression without assuming normality, transformations can also help.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Ben Ward
Sent: Thursday, January 06, 2011 2:00 PM
To: r-help at r-project.org
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

On 06/01/2011 20:29, Greg Snow wrote:
Some would argue to always use the kruskal wallis test since we never
know for sure if we have normality.  Personally I am not sure that I
understand what exactly that test is really testing.  Plus in your case
you are doing a two-way anova and kruskal.test does one-way, so it will
not work for your case.  There are other non-parametric options.
Just read this and had queries of my own and comments on this subject:
Would one of these options be to rank the data before doing whatever
model or test you want to do? As I understand it makes the place of the
data the same, but pulls extreme cases closer to the rest. Not an
expert
though.
I've been doing lm() for my work, and I don't know if that makes an
assumption of normality (may data is not normal). And I'm unsure of any
other assumptions as my texts don't really discuss them. Although I can
comfortably evaluate a model say using residual vs fitted, and F values
turned to P, resampling and confidence intervals, and looking at sums
of
squares terms add to explanation of the model. I've tried the plot()
function to help graphically evaluate a model, and I want to make sure
I
understand what it's showing me. I think the first, is showing me the
models fitted values vs the residuals, and ideally, I think the closer
the points are to the red line the better. The next plot is a Q-Q plot,
the closer the points to the line, the more normal the model
coefficients (or perhaps the data). I'm not sure what the next two
plots
are, but it is titled Scale-Location. And it looks to have the square
root of standardized residuals on y, and fitted model values on x.
Might
this be similar to the first plot? The final one is titled Residuals vs
Leverage, which has standardized residuals on y and leverage on x, and
something called Cooks Distance is plotted as well.

Thanks,
Ben. W
Whether to use anova and other normality based tests is really a
matter of what assumptions you are willing to live with and what level
of "close enough" you are comfortable with.  Consulting with a local
consultant with experience in these areas is useful if you don't have
enough experience to decide what you are comfortable with.
For your description, I would try the proportional odds logistic
regression, but again, you should probably consult with someone who has
experience rather than trying that on your own until you have more
training and experience.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

From: Frodo Jedi [mailto:frodo.jedi at yahoo.com]
Sent: Thursday, January 06, 2011 12:57 PM
To: Greg Snow; r-help at r-project.org
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality
Ok,
I see ;-)

Let?s put in this way then. When do I have to use the kruskal wallis
test? I mean, when I am very sure that I have
to use it instead of ANOVA?

Thanks

Best regards

P.S.  In addition, which is the non parametric methods corresponding
to a 2 ways anova?..or have I to
repeat many times the kruskal wallis test?
________________________________
From: Greg Snow<Greg.Snow at imail.org>
To: Frodo Jedi<frodo.jedi at yahoo.com>; Robert Baer<rbaer at atsu.edu>;
"r-help at r-project.org"<r-help at r-project.org>
Sent: Thu, January 6, 2011 7:07:17 PM
Subject: RE: [R] Assumptions for ANOVA: the right way to check the
normality
Remember that an non-significant result (especially one that is still
near alpha like yours) does not give evidence that the null is true.
The reason that the 1st 2 tests below don't show significance is more
due to lack of power than some of the residuals being normal.  The only
test that I would trust for this is SnowsPenultimateNormalityTest
(TeachingDemos package, the help page is more useful than the function
itself).
But I think that you are mixing up 2 different concepts (a very
common misunderstanding).  What is important if we want to do normal
theory inference is that the coefficients/effects/estimates are
normally distributed.  Now since these coefficients can be shown to be
linear combinations of the error terms, if the errors are iid normal
then the coefficients are also normally distributed.  So many people
want to show that the residuals come from a perfectly normal
distribution.  But it is the theoretical errors, not the observed
residuals that are important (the observed residuals are not iid).  You
need to think about the source of your data to see if this is a
reasonable assumption.  Now I cannot fathom any universe (theoretical
or real) in which normally distributed errors added to means that they
are independent of will result in a finite set of integers, so an
assumption of exact normality is not reasonable (some may want to argue
this, but convincing me will be very difficult).  But looking for exact
normality is a bit of a red herring because, we also have the Central
Limit Theorem that says that if the errors are not normal (but still
iid) then the distribution of the coefficients will approach normality
as the sample size increases.  This is what make statistics doable
(because no real dataset entered into the computer is exactly normal).
The more important question is are the residuals "normal enough"?  for
which there is not a definitive test (experience and plots help).
But this all depends on another assumption that I don't think that
you have even considered.  Yes we can use normal theory even when the
random part of the data is not normally distributed, but this still
assumes that the data is at least interval data, i.e. that we firmly
believe that the difference between a response of 1 and a response of 2
is exactly the same as a difference between a 6 and a 7 and that the
difference from 4 to 6 is exactly twice that of 1 vs. 2.  From your
data and other descriptions, I don't think that that is a reasonable
assumption.  If you are not willing to make that assumption (like me)
then means and normal theory tests are meaningless and you should use
other approaches.  One possibility is to use non-parametric methods
(which I believe Frank has already suggested you use), another is to
use proportional odds logistic regression.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org<mailto:greg.snow at imail.org>
801.408.8111

-----Original Message-----
From: r-help-bounces at r-project.org<mailto:r-help-bounces at r-
project.org>   [mailto:r-help-bounces at r-
project.org<http://project.org>] On Behalf Of Frodo Jedi
Sent: Wednesday, January 05, 2011 3:22 PM
To: Robert Baer; r-help at r-project.org<mailto:r-help at r-project.org>
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

Dear Robert,
thanks so much!!!  Now I understand!
So you also think that I have to check only the residuals and not
the
data
directly.
Now just for curiosity I did the the shapiro test on the residuals.
The
problem
is that on fit3 I don?t get from the test
that the data are normally distribuited. Why? Here the data:

shapiro.test(residuals(fit1))
     Shapiro-Wilk normality test

data:  residuals(fit1)
W = 0.9848, p-value = 0.05693

#Here the test is ok: the test says that the data are distributed
normally
(p-value greather than 0.05)

shapiro.test(residuals(fit2))
     Shapiro-Wilk normality test

data:  residuals(fit2)
W = 0.9853, p-value = 0.06525

#Here the test is ok: the test says that the data are distributed
normally
(p-value greather than 0.05)

shapiro.test(residuals(fit3))
     Shapiro-Wilk normality test

data:  residuals(fit3)
W = 0.9621, p-value = 0.0001206

Now the test reveals p-value lower than 0.05: so the residuals for
fit3
are not
distributed normally....
Why I get this beheaviour? Indeed in the histogram and Q-Q plot for
fit3
residuals I get a normal distribution.

________________________________
From: Robert Baer<rbaer at atsu.edu<mailto:rbaer at atsu.edu>>

Sent: Wed, January 5, 2011 8:56:50 PM
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

Someone suggested me that I don?t have to check the normality of
the
data, but
the normality of the residuals I get after the fitting of the
linear
model.
I really ask you to help me to understand this point as I don?t
find
enough
material online where to solve it.
Try the following:
# using your scrd data and your proposed models
fit1<- lm(response ~ stimulus + condition + stimulus:condition,
data=scrd)
fit2<- lm(response ~ stimulus + condition, data=scrd)
fit3<- lm(response ~ condition, data=scrd)

# Set up for 6 plots on 1 panel
op = par(mfrow=c(2,3))

# residuals function extracts residuals
# Visual inspection is a good start for checking normality
# You get a much better feel than from some "magic number" statistic
hist(residuals(fit1))
hist(residuals(fit2))
hist(residuals(fit3))

# especially qqnorm() plots which are linear for normal data
qqnorm(residuals(fit1))
qqnorm(residuals(fit2))
qqnorm(residuals(fit3))

# Restore plot parameters
par(op)

If the data are not normally distributed I have to use the kruskal
wallys test
and not the ANOVA...so please help
me to understand.
Indeed - Kruskal-Wallis is a good test to use for one factor data
that
is
ordinal so it is a good alternative to your fit3.
Your "response" seems to be a discrete variable rather than a
continuous
variable.
You must decide if it is reasonable to approximate it with a normal
distribution
which is by definition continuous.

I make a numerical example, could you please tell me if the data in
this table
are normally distributed or not?

Help!

number                  stimulus condition response
1            flat_550_W_realism        A        3
2            flat_550_W_realism        A        3
3            flat_550_W_realism        A        5
4            flat_550_W_realism        A        3
5            flat_550_W_realism        A        3
6            flat_550_W_realism        A        3
7            flat_550_W_realism        A        3
8            flat_550_W_realism        A        5
9            flat_550_W_realism        A        3
10            flat_550_W_realism        A        3
11            flat_550_W_realism        A        5
12            flat_550_W_realism        A        7
13            flat_550_W_realism        A        5
14            flat_550_W_realism        A        2
15            flat_550_W_realism        A        3
16            flat_550_W_realism        AH        7
17            flat_550_W_realism        AH        4
18            flat_550_W_realism        AH        5
19            flat_550_W_realism        AH        3
20            flat_550_W_realism        AH        6
21            flat_550_W_realism        AH        5
22            flat_550_W_realism        AH        3
23            flat_550_W_realism        AH        5
24            flat_550_W_realism        AH        5
25            flat_550_W_realism        AH        7
26            flat_550_W_realism        AH        2
27            flat_550_W_realism        AH        7
28            flat_550_W_realism        AH        5
29            flat_550_W_realism        AH        5
30        bump_2_step_W_realism        A        1
31        bump_2_step_W_realism        A        3
32        bump_2_step_W_realism        A        5
33        bump_2_step_W_realism        A        1
34        bump_2_step_W_realism        A        3
35        bump_2_step_W_realism        A        2
36        bump_2_step_W_realism        A        5
37        bump_2_step_W_realism        A        4
38        bump_2_step_W_realism        A        4
39        bump_2_step_W_realism        A        4
40        bump_2_step_W_realism        A        4
41        bump_2_step_W_realism        AH        3
42        bump_2_step_W_realism        AH        5
43        bump_2_step_W_realism        AH        1
44        bump_2_step_W_realism        AH        5
45        bump_2_step_W_realism        AH        4
46        bump_2_step_W_realism        AH        4
47        bump_2_step_W_realism        AH        5
48        bump_2_step_W_realism        AH        4
49        bump_2_step_W_realism        AH        3
50        bump_2_step_W_realism        AH        4
51        bump_2_step_W_realism        AH        5
52        bump_2_step_W_realism        AH        4
53        hole_2_step_W_realism        A        3
54        hole_2_step_W_realism        A        3
55        hole_2_step_W_realism        A        4
56        hole_2_step_W_realism        A        1
57        hole_2_step_W_realism        A        4
58        hole_2_step_W_realism        A        3
59        hole_2_step_W_realism        A        5
60        hole_2_step_W_realism        A        4
61        hole_2_step_W_realism        A        3
62        hole_2_step_W_realism        A        4
63        hole_2_step_W_realism        A        7
64        hole_2_step_W_realism        A        5
65        hole_2_step_W_realism        A        1
66        hole_2_step_W_realism        A        4
67        hole_2_step_W_realism        AH        7
68        hole_2_step_W_realism        AH        5
69        hole_2_step_W_realism        AH        5
70        hole_2_step_W_realism        AH        1
71        hole_2_step_W_realism        AH        5
72        hole_2_step_W_realism        AH        5
73        hole_2_step_W_realism        AH        5
74        hole_2_step_W_realism        AH        2
75        hole_2_step_W_realism        AH        6
76        hole_2_step_W_realism        AH        5
77        hole_2_step_W_realism        AH        5
78        hole_2_step_W_realism        AH        6
79    bump_2_heel_toe_W_realism        A        3
80    bump_2_heel_toe_W_realism        A        3
81    bump_2_heel_toe_W_realism        A        3
82    bump_2_heel_toe_W_realism        A        2
83    bump_2_heel_toe_W_realism        A        3
84    bump_2_heel_toe_W_realism        A        3
85    bump_2_heel_toe_W_realism        A        4
86    bump_2_heel_toe_W_realism        A        3
87    bump_2_heel_toe_W_realism        A        4
88    bump_2_heel_toe_W_realism        A        4
89    bump_2_heel_toe_W_realism        A        6
90    bump_2_heel_toe_W_realism        A        5
91    bump_2_heel_toe_W_realism        A        4
92    bump_2_heel_toe_W_realism        AH        7
93    bump_2_heel_toe_W_realism        AH        3
94    bump_2_heel_toe_W_realism        AH        4
95    bump_2_heel_toe_W_realism        AH        2
96    bump_2_heel_toe_W_realism        AH        5
97    bump_2_heel_toe_W_realism        AH        6
98    bump_2_heel_toe_W_realism        AH        4
99    bump_2_heel_toe_W_realism        AH        4
100    bump_2_heel_toe_W_realism        AH        4
101    bump_2_heel_toe_W_realism        AH        5
102    bump_2_heel_toe_W_realism        AH        2
103    bump_2_heel_toe_W_realism        AH        6
104    bump_2_heel_toe_W_realism        AH        5
105    hole_2_heel_toe_W_realism        A        3
106    hole_2_heel_toe_W_realism        A        3
107    hole_2_heel_toe_W_realism        A        1
108    hole_2_heel_toe_W_realism        A        3
109    hole_2_heel_toe_W_realism        A        3
110    hole_2_heel_toe_W_realism        A        5
111    hole_2_heel_toe_W_realism        A        2
112    hole_2_heel_toe_W_realism        AH        5
113    hole_2_heel_toe_W_realism        AH        1
114    hole_2_heel_toe_W_realism        AH        3
115    hole_2_heel_toe_W_realism        AH        6
116    hole_2_heel_toe_W_realism        AH        5
117    hole_2_heel_toe_W_realism        AH        4
118    hole_2_heel_toe_W_realism        AH        4
119    hole_2_heel_toe_W_realism        AH        3
120    hole_2_heel_toe_W_realism        AH        3
121    hole_2_heel_toe_W_realism        AH        1
122    hole_2_heel_toe_W_realism        AH        5
123 bump_2_combination_W_realism        A        4
124 bump_2_combination_W_realism        A        2
125 bump_2_combination_W_realism        A        4
126 bump_2_combination_W_realism        A        1
127 bump_2_combination_W_realism        A        4
128 bump_2_combination_W_realism        A        4
129 bump_2_combination_W_realism        A        2
130 bump_2_combination_W_realism        A        4
131 bump_2_combination_W_realism        A        2
132 bump_2_combination_W_realism        A        4
133 bump_2_combination_W_realism        A        2
134 bump_2_combination_W_realism        A        6
135 bump_2_combination_W_realism        AH        7
136 bump_2_combination_W_realism        AH        3
137 bump_2_combination_W_realism        AH        4
138 bump_2_combination_W_realism        AH        1
139 bump_2_combination_W_realism        AH        6
140 bump_2_combination_W_realism        AH        5
141 bump_2_combination_W_realism        AH        5
142 bump_2_combination_W_realism        AH        6
143 bump_2_combination_W_realism        AH        5
144 bump_2_combination_W_realism        AH        4
145 bump_2_combination_W_realism        AH        2
146 bump_2_combination_W_realism        AH        4
147 bump_2_combination_W_realism        AH        2
148 bump_2_combination_W_realism        AH        5
149 hole_2_combination_W_realism        A        5
150 hole_2_combination_W_realism        A        2
151 hole_2_combination_W_realism        A        4
152 hole_2_combination_W_realism        A        1
153 hole_2_combination_W_realism        A        5
154 hole_2_combination_W_realism        A        4
155 hole_2_combination_W_realism        A        3
156 hole_2_combination_W_realism        A        5
157 hole_2_combination_W_realism        A        2
158 hole_2_combination_W_realism        A        5
159 hole_2_combination_W_realism        A        5
160 hole_2_combination_W_realism        A        1
161 hole_2_combination_W_realism        AH        7
162 hole_2_combination_W_realism        AH        5
163 hole_2_combination_W_realism        AH        3
164 hole_2_combination_W_realism        AH        1
165 hole_2_combination_W_realism        AH        6
166 hole_2_combination_W_realism        AH        4
167 hole_2_combination_W_realism        AH        7
168 hole_2_combination_W_realism        AH        5
169 hole_2_combination_W_realism        AH        5
170 hole_2_combination_W_realism        AH        2
171 hole_2_combination_W_realism        AH        6
172 hole_2_combination_W_realism        AH        2
173 hole_2_combination_W_realism        AH        4

Thanks in advance

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org>   mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

       [[alternative HTML version deleted]]
     [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
       [[alternative HTML version deleted]]

Dear Greg,
many thanks for your answer. Now I have a problem then in understanding how to 
check 

normality in case of ANOVA with repeated measures.
I would need an help with a numeric example, as I haven?tu fully understood how 
it works with the
proj() command as it as suggested by another R user in this mailing list.

For example, in attachment you find a .csv table resulting from an experiment, 
you can access it by means of this command:
scrd<-  
read.csv(file='/Users/....../tables_for_R/table_quality_wood.csv',sep=',',header=T)

The data are from an experiment where participants had to evaluate on a seven 
point likert scale
the realism of some stimuli, which are presented both in condition "A" and in 
condition "AH".

I need to perform the ANOVA by means of this command:
aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)), 
data=scrd)
but the problem is that I cannot plot as usually do the qqnorm on the residuals 
of the fit because
lm does not support the Error term present in aov.
I normally check normality through a plot (or a shapiro.test function). Now 
could you please 

illustrate me how will you be able to undestand from my data if they are 
normally distributed?

Please enlighten me

Best regards
<r-help at r-project.org>
Sent: Fri, January 7, 2011 7:34:05 PM
Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality

A lot of this depends on what question you are really trying to answer.  For one 
way anova replacing y-values with their ranks essentially transforms the 
distribution to uniform (under the null) and the Central Limit Theorem kicks in 
for the uniform with samples larger than about 5, so the normal approximations 
are pretty good and the theory works, but what are you actually testing?  The 
most meaningful null that is being tested is that all data come from the exact 
same distribution.  So what does it mean when you reject that null?  It means 
that all the groups are not representing the same distribution, but is that 
because the means differ? Or the variances? Or the shapes? It can be any of 
those.  Some point out that if you make certain assumptions such as symmetry or 
shifts of the same distributions, then you can talk about differences in means 
or medians, but usually if I am using non-parametrics it is because I don't 
believe that things are symmetric and the shift idea doesn't fit in my mind.

Some alternatives include bootstrapping or permutation tests, or just 
transforming the data to get something closer to normal.

Now what does replacing by ranks do in 2-way anova where we want to test the 
difference in one factor without making assumptions about whether the other 
factor has an effect or not?  I'm not sure on this one.

I have seen regression on ranks, it basically tests for some level of 
relationship, but regression is usually used for some type of prediction and 
predicting from a rank-rank regression does not seem meaningful to me.

Fitting the regression model does not require normality, it is the tests on the 
coefficients and confidence and prediction intervals that assume normality 
(again the CLT helps for large samples (but not for prediction intervals)).  
Bootstrapping is an option for regression without assuming normality, 
transformations can also help.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ben Ward
> Sent: Thursday, January 06, 2011 2:00 PM
> To: r-help at r-project.org
> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> normality
>
> On 06/01/2011 20:29, Greg Snow wrote:
> > Some would argue to always use the kruskal wallis test since we never
> know for sure if we have normality.  Personally I am not sure that I
> understand what exactly that test is really testing.  Plus in your case
> you are doing a two-way anova and kruskal.test does one-way, so it will
> not work for your case.  There are other non-parametric options.
> Just read this and had queries of my own and comments on this subject:
> Would one of these options be to rank the data before doing whatever
> model or test you want to do? As I understand it makes the place of the
> data the same, but pulls extreme cases closer to the rest. Not an
> expert
> though.
> I've been doing lm() for my work, and I don't know if that makes an
> assumption of normality (may data is not normal). And I'm unsure of any
> other assumptions as my texts don't really discuss them. Although I can
> comfortably evaluate a model say using residual vs fitted, and F values
> turned to P, resampling and confidence intervals, and looking at sums
> of
> squares terms add to explanation of the model. I've tried the plot()
> function to help graphically evaluate a model, and I want to make sure
> I
> understand what it's showing me. I think the first, is showing me the
> models fitted values vs the residuals, and ideally, I think the closer
> the points are to the red line the better. The next plot is a Q-Q plot,
> the closer the points to the line, the more normal the model
> coefficients (or perhaps the data). I'm not sure what the next two
> plots
> are, but it is titled Scale-Location. And it looks to have the square
> root of standardized residuals on y, and fitted model values on x.
> Might
> this be similar to the first plot? The final one is titled Residuals vs
> Leverage, which has standardized residuals on y and leverage on x, and
> something called Cooks Distance is plotted as well.
>
> Thanks,
> Ben. W
> > Whether to use anova and other normality based tests is really a
> matter of what assumptions you are willing to live with and what level
> of "close enough" you are comfortable with.  Consulting with a local
> consultant with experience in these areas is useful if you don't have
> enough experience to decide what you are comfortable with.
> >
> > For your description, I would try the proportional odds logistic
> regression, but again, you should probably consult with someone who has
> experience rather than trying that on your own until you have more
> training and experience.
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> > From: Frodo Jedi [mailto:frodo.jedi at yahoo.com]
> > Sent: Thursday, January 06, 2011 12:57 PM
> > To: Greg Snow; r-help at r-project.org
> > Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> normality
> >
> >
> > Ok,
> > I see ;-)
> >
> > Let?s put in this way then. When do I have to use the kruskal wallis
> test? I mean, when I am very sure that I have
> > to use it instead of ANOVA?
> >
> > Thanks
> >
> >
> > Best regards
> >
> > P.S.  In addition, which is the non parametric methods corresponding
> to a 2 ways anova?..or have I to
> > repeat many times the kruskal wallis test?
> > ________________________________
> > From: Greg Snow<Greg.Snow at imail.org>
> > To: Frodo Jedi<frodo.jedi at yahoo.com>; Robert Baer<rbaer at atsu.edu>;
> "r-help at r-project.org"<r-help at r-project.org>
> > Sent: Thu, January 6, 2011 7:07:17 PM
> > Subject: RE: [R] Assumptions for ANOVA: the right way to check the
> normality
> >
> > Remember that an non-significant result (especially one that is still
> near alpha like yours) does not give evidence that the null is true.
> The reason that the 1st 2 tests below don't show significance is more
> due to lack of power than some of the residuals being normal.  The only
> test that I would trust for this is SnowsPenultimateNormalityTest
> (TeachingDemos package, the help page is more useful than the function
> itself).
> >
> > But I think that you are mixing up 2 different concepts (a very
> common misunderstanding).  What is important if we want to do normal
> theory inference is that the coefficients/effects/estimates are
> normally distributed.  Now since these coefficients can be shown to be
> linear combinations of the error terms, if the errors are iid normal
> then the coefficients are also normally distributed.  So many people
> want to show that the residuals come from a perfectly normal
> distribution.  But it is the theoretical errors, not the observed
> residuals that are important (the observed residuals are not iid).  You
> need to think about the source of your data to see if this is a
> reasonable assumption.  Now I cannot fathom any universe (theoretical
> or real) in which normally distributed errors added to means that they
> are independent of will result in a finite set of integers, so an
> assumption of exact normality is not reasonable (some may want to argue
> this, but convincing me will be very difficult).  But looking for exact
> normality is a bit of a red herring because, we also have the Central
> Limit Theorem that says that if the errors are not normal (but still
> iid) then the distribution of the coefficients will approach normality
> as the sample size increases.  This is what make statistics doable
> (because no real dataset entered into the computer is exactly normal).
> The more important question is are the residuals "normal enough"?  for
> which there is not a definitive test (experience and plots help).
> >
> > But this all depends on another assumption that I don't think that
> you have even considered.  Yes we can use normal theory even when the
> random part of the data is not normally distributed, but this still
> assumes that the data is at least interval data, i.e. that we firmly
> believe that the difference between a response of 1 and a response of 2
> is exactly the same as a difference between a 6 and a 7 and that the
> difference from 4 to 6 is exactly twice that of 1 vs. 2.  From your
> data and other descriptions, I don't think that that is a reasonable
> assumption.  If you are not willing to make that assumption (like me)
> then means and normal theory tests are meaningless and you should use
> other approaches.  One possibility is to use non-parametric methods
> (which I believe Frank has already suggested you use), another is to
> use proportional odds logistic regression.
> >
> >
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org<mailto:greg.snow at imail.org>
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org<mailto:r-help-bounces at r-
> project.org>  [mailto:r-help-bounces at r-
> >> project.org<http://project.org>] On Behalf Of Frodo Jedi
> >> Sent: Wednesday, January 05, 2011 3:22 PM
> >> To: Robert Baer; r-help at r-project.org<mailto:r-help at r-project.org>
> >> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> >> normality
> >>
> >> Dear Robert,
> >> thanks so much!!!  Now I understand!
> >> So you also think that I have to check only the residuals and not
> the
> >> data
> >> directly.
> >> Now just for curiosity I did the the shapiro test on the residuals.
> The
> >> problem
> >> is that on fit3 I don?t get from the test
> >> that the data are normally distribuited. Why? Here the data:
> >>
> >>> shapiro.test(residuals(fit1))
> >>     Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit1)
> >> W = 0.9848, p-value = 0.05693
> >>
> >> #Here the test is ok: the test says that the data are distributed
> >> normally
> >> (p-value greather than 0.05)
> >>
> >>
> >>
> >>> shapiro.test(residuals(fit2))
> >>     Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit2)
> >> W = 0.9853, p-value = 0.06525
> >>
> >> #Here the test is ok: the test says that the data are distributed
> >> normally
> >> (p-value greather than 0.05)
> >>
> >>
> >>
> >>> shapiro.test(residuals(fit3))
> >>     Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit3)
> >> W = 0.9621, p-value = 0.0001206
> >>
> >>
> >>
> >> Now the test reveals p-value lower than 0.05: so the residuals for
> fit3
> >> are not
> >> distributed normally....
> >> Why I get this beheaviour? Indeed in the histogram and Q-Q plot for
> >> fit3
> >> residuals I get a normal distribution.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: Robert Baer<rbaer at atsu.edu<mailto:rbaer at atsu.edu>>
> >>
> >> Sent: Wed, January 5, 2011 8:56:50 PM
> >> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> >> normality
> >>
> >>> Someone suggested me that I don?t have to check the normality of
> the
> >> data, but
> >>> the normality of the residuals I get after the fitting of the
> linear
> >> model.
> >>> I really ask you to help me to understand this point as I don?t
> find
> >> enough
> >>> material online where to solve it.
> >> Try the following:
> >> # using your scrd data and your proposed models
> >> fit1<- lm(response ~ stimulus + condition + stimulus:condition,
> >> data=scrd)
> >> fit2<- lm(response ~ stimulus + condition, data=scrd)
> >> fit3<- lm(response ~ condition, data=scrd)
> >>
> >> # Set up for 6 plots on 1 panel
> >> op = par(mfrow=c(2,3))
> >>
> >> # residuals function extracts residuals
> >> # Visual inspection is a good start for checking normality
> >> # You get a much better feel than from some "magic number" statistic
> >> hist(residuals(fit1))
> >> hist(residuals(fit2))
> >> hist(residuals(fit3))
> >>
> >> # especially qqnorm() plots which are linear for normal data
> >> qqnorm(residuals(fit1))
> >> qqnorm(residuals(fit2))
> >> qqnorm(residuals(fit3))
> >>
> >> # Restore plot parameters
> >> par(op)
> >>
> >>> If the data are not normally distributed I have to use the kruskal
> >> wallys test
> >>> and not the ANOVA...so please help
> >>> me to understand.
> >> Indeed - Kruskal-Wallis is a good test to use for one factor data
> that
> >> is
> >> ordinal so it is a good alternative to your fit3.
> >> Your "response" seems to be a discrete variable rather than a
> >> continuous
> >> variable.
> >> You must decide if it is reasonable to approximate it with a normal
> >> distribution
> >> which is by definition continuous.
> >>
> >>> I make a numerical example, could you please tell me if the data in
> >> this table
> >>> are normally distributed or not?
> >>>
> >>> Help!
> >>>
> >>>
> >>> number                  stimulus condition response
> >>> 1            flat_550_W_realism        A        3
> >>> 2            flat_550_W_realism        A        3
> >>> 3            flat_550_W_realism        A        5
> >>> 4            flat_550_W_realism        A        3
> >>> 5            flat_550_W_realism        A        3
> >>> 6            flat_550_W_realism        A        3
> >>> 7            flat_550_W_realism        A        3
> >>> 8            flat_550_W_realism        A        5
> >>> 9            flat_550_W_realism        A        3
> >>> 10            flat_550_W_realism        A        3
> >>> 11            flat_550_W_realism        A        5
> >>> 12            flat_550_W_realism        A        7
> >>> 13            flat_550_W_realism        A        5
> >>> 14            flat_550_W_realism        A        2
> >>> 15            flat_550_W_realism        A        3
> >>> 16            flat_550_W_realism        AH        7
> >>> 17            flat_550_W_realism        AH        4
> >>> 18            flat_550_W_realism        AH        5
> >>> 19            flat_550_W_realism        AH        3
> >>> 20            flat_550_W_realism        AH        6
> >>> 21            flat_550_W_realism        AH        5
> >>> 22            flat_550_W_realism        AH        3
> >>> 23            flat_550_W_realism        AH        5
> >>> 24            flat_550_W_realism        AH        5
> >>> 25            flat_550_W_realism        AH        7
> >>> 26            flat_550_W_realism        AH        2
> >>> 27            flat_550_W_realism        AH        7
> >>> 28            flat_550_W_realism        AH        5
> >>> 29            flat_550_W_realism        AH        5
> >>> 30        bump_2_step_W_realism        A        1
> >>> 31        bump_2_step_W_realism        A        3
> >>> 32        bump_2_step_W_realism        A        5
> >>> 33        bump_2_step_W_realism        A        1
> >>> 34        bump_2_step_W_realism        A        3
> >>> 35        bump_2_step_W_realism        A        2
> >>> 36        bump_2_step_W_realism        A        5
> >>> 37        bump_2_step_W_realism        A        4
> >>> 38        bump_2_step_W_realism        A        4
> >>> 39        bump_2_step_W_realism        A        4
> >>> 40        bump_2_step_W_realism        A        4
> >>> 41        bump_2_step_W_realism        AH        3
> >>> 42        bump_2_step_W_realism        AH        5
> >>> 43        bump_2_step_W_realism        AH        1
> >>> 44        bump_2_step_W_realism        AH        5
> >>> 45        bump_2_step_W_realism        AH        4
> >>> 46        bump_2_step_W_realism        AH        4
> >>> 47        bump_2_step_W_realism        AH        5
> >>> 48        bump_2_step_W_realism        AH        4
> >>> 49        bump_2_step_W_realism        AH        3
> >>> 50        bump_2_step_W_realism        AH        4
> >>> 51        bump_2_step_W_realism        AH        5
> >>> 52        bump_2_step_W_realism        AH        4
> >>> 53        hole_2_step_W_realism        A        3
> >>> 54        hole_2_step_W_realism        A        3
> >>> 55        hole_2_step_W_realism        A        4
> >>> 56        hole_2_step_W_realism        A        1
> >>> 57        hole_2_step_W_realism        A        4
> >>> 58        hole_2_step_W_realism        A        3
> >>> 59        hole_2_step_W_realism        A        5
> >>> 60        hole_2_step_W_realism        A        4
> >>> 61        hole_2_step_W_realism        A        3
> >>> 62        hole_2_step_W_realism        A        4
> >>> 63        hole_2_step_W_realism        A        7
> >>> 64        hole_2_step_W_realism        A        5
> >>> 65        hole_2_step_W_realism        A        1
> >>> 66        hole_2_step_W_realism        A        4
> >>> 67        hole_2_step_W_realism        AH        7
> >>> 68        hole_2_step_W_realism        AH        5
> >>> 69        hole_2_step_W_realism        AH        5
> >>> 70        hole_2_step_W_realism        AH        1
> >>> 71        hole_2_step_W_realism        AH        5
> >>> 72        hole_2_step_W_realism        AH        5
> >>> 73        hole_2_step_W_realism        AH        5
> >>> 74        hole_2_step_W_realism        AH        2
> >>> 75        hole_2_step_W_realism        AH        6
> >>> 76        hole_2_step_W_realism        AH        5
> >>> 77        hole_2_step_W_realism        AH        5
> >>> 78        hole_2_step_W_realism        AH        6
> >>> 79    bump_2_heel_toe_W_realism        A        3
> >>> 80    bump_2_heel_toe_W_realism        A        3
> >>> 81    bump_2_heel_toe_W_realism        A        3
> >>> 82    bump_2_heel_toe_W_realism        A        2
> >>> 83    bump_2_heel_toe_W_realism        A        3
> >>> 84    bump_2_heel_toe_W_realism        A        3
> >>> 85    bump_2_heel_toe_W_realism        A        4
> >>> 86    bump_2_heel_toe_W_realism        A        3
> >>> 87    bump_2_heel_toe_W_realism        A        4
> >>> 88    bump_2_heel_toe_W_realism        A        4
> >>> 89    bump_2_heel_toe_W_realism        A        6
> >>> 90    bump_2_heel_toe_W_realism        A        5
> >>> 91    bump_2_heel_toe_W_realism        A        4
> >>> 92    bump_2_heel_toe_W_realism        AH        7
> >>> 93    bump_2_heel_toe_W_realism        AH        3
> >>> 94    bump_2_heel_toe_W_realism        AH        4
> >>> 95    bump_2_heel_toe_W_realism        AH        2
> >>> 96    bump_2_heel_toe_W_realism        AH        5
> >>> 97    bump_2_heel_toe_W_realism        AH        6
> >>> 98    bump_2_heel_toe_W_realism        AH        4
> >>> 99    bump_2_heel_toe_W_realism        AH        4
> >>> 100    bump_2_heel_toe_W_realism        AH        4
> >>> 101    bump_2_heel_toe_W_realism        AH        5
> >>> 102    bump_2_heel_toe_W_realism        AH        2
> >>> 103    bump_2_heel_toe_W_realism        AH        6
> >>> 104    bump_2_heel_toe_W_realism        AH        5
> >>> 105    hole_2_heel_toe_W_realism        A        3
> >>> 106    hole_2_heel_toe_W_realism        A        3
> >>> 107    hole_2_heel_toe_W_realism        A        1
> >>> 108    hole_2_heel_toe_W_realism        A        3
> >>> 109    hole_2_heel_toe_W_realism        A        3
> >>> 110    hole_2_heel_toe_W_realism        A        5
> >>> 111    hole_2_heel_toe_W_realism        A        2
> >>> 112    hole_2_heel_toe_W_realism        AH        5
> >>> 113    hole_2_heel_toe_W_realism        AH        1
> >>> 114    hole_2_heel_toe_W_realism        AH        3
> >>> 115    hole_2_heel_toe_W_realism        AH        6
> >>> 116    hole_2_heel_toe_W_realism        AH        5
> >>> 117    hole_2_heel_toe_W_realism        AH        4
> >>> 118    hole_2_heel_toe_W_realism        AH        4
> >>> 119    hole_2_heel_toe_W_realism        AH        3
> >>> 120    hole_2_heel_toe_W_realism        AH        3
> >>> 121    hole_2_heel_toe_W_realism        AH        1
> >>> 122    hole_2_heel_toe_W_realism        AH        5
> >>> 123 bump_2_combination_W_realism        A        4
> >>> 124 bump_2_combination_W_realism        A        2
> >>> 125 bump_2_combination_W_realism        A        4
> >>> 126 bump_2_combination_W_realism        A        1
> >>> 127 bump_2_combination_W_realism        A        4
> >>> 128 bump_2_combination_W_realism        A        4
> >>> 129 bump_2_combination_W_realism        A        2
> >>> 130 bump_2_combination_W_realism        A        4
> >>> 131 bump_2_combination_W_realism        A        2
> >>> 132 bump_2_combination_W_realism        A        4
> >>> 133 bump_2_combination_W_realism        A        2
> >>> 134 bump_2_combination_W_realism        A        6
> >>> 135 bump_2_combination_W_realism        AH        7
> >>> 136 bump_2_combination_W_realism        AH        3
> >>> 137 bump_2_combination_W_realism        AH        4
> >>> 138 bump_2_combination_W_realism        AH        1
> >>> 139 bump_2_combination_W_realism        AH        6
> >>> 140 bump_2_combination_W_realism        AH        5
> >>> 141 bump_2_combination_W_realism        AH        5
> >>> 142 bump_2_combination_W_realism        AH        6
> >>> 143 bump_2_combination_W_realism        AH        5
> >>> 144 bump_2_combination_W_realism        AH        4
> >>> 145 bump_2_combination_W_realism        AH        2
> >>> 146 bump_2_combination_W_realism        AH        4
> >>> 147 bump_2_combination_W_realism        AH        2
> >>> 148 bump_2_combination_W_realism        AH        5
> >>> 149 hole_2_combination_W_realism        A        5
> >>> 150 hole_2_combination_W_realism        A        2
> >>> 151 hole_2_combination_W_realism        A        4
> >>> 152 hole_2_combination_W_realism        A        1
> >>> 153 hole_2_combination_W_realism        A        5
> >>> 154 hole_2_combination_W_realism        A        4
> >>> 155 hole_2_combination_W_realism        A        3
> >>> 156 hole_2_combination_W_realism        A        5
> >>> 157 hole_2_combination_W_realism        A        2
> >>> 158 hole_2_combination_W_realism        A        5
> >>> 159 hole_2_combination_W_realism        A        5
> >>> 160 hole_2_combination_W_realism        A        1
> >>> 161 hole_2_combination_W_realism        AH        7
> >>> 162 hole_2_combination_W_realism        AH        5
> >>> 163 hole_2_combination_W_realism        AH        3
> >>> 164 hole_2_combination_W_realism        AH        1
> >>> 165 hole_2_combination_W_realism        AH        6
> >>> 166 hole_2_combination_W_realism        AH        4
> >>> 167 hole_2_combination_W_realism        AH        7
> >>> 168 hole_2_combination_W_realism        AH        5
> >>> 169 hole_2_combination_W_realism        AH        5
> >>> 170 hole_2_combination_W_realism        AH        2
> >>> 171 hole_2_combination_W_realism        AH        6
> >>> 172 hole_2_combination_W_realism        AH        2
> >>> 173 hole_2_combination_W_realism        AH        4
> >>>
> >>>
> >>>
> >>>
> >>> Thanks in advance
> >>>
> >>>
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>>
> >>
> >>
> >>> ______________________________________________
> >>> R-help at r-project.org<mailto:R-help at r-project.org>  mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
>
> >>
> >>
> >>       [[alternative HTML version deleted]]
> >
> >     [[alternative HTML version deleted]]
> >
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110110/e78a08f6/attachment.pl>
I can't get hotmail to indicate the original text so I'm going to top
post. There seems to be a lot of back and forth here, let me see if these
comment help guide discussion a bit. 

I tried to run some histograms of your experiment (prior to a bunch of other things )
and IIRC in many cases
you have counts under 10. At minimum, anything you do or any test you
run you want to do some senistivity analyses and perturb your data a bit.
Your objective of course is important- say you want to calibrate your
response data and try to validate your assumption that your survey 
question relfect some continuous variable ( but a respondent can only
round his response to an int as in teh case of taking a temperature for example, otherwise
all you can really say is that these things are like ranks, 7>6>5 etc ). 
Personally I always avoid non-parametrics
( just personal bias) but with small samples and a response that is closer
to a rank than a continuous variable with some meaning, it may make sense.

If you plot hitograms of responses versus A and AH, visually they look
different, you could try fitting the histos to various pdf's and see what you
get etc. This is all retro/post-hoc so you may as well explore away.

From: Greg.Snow at imail.org
To: frodo.jedi at yahoo.com
Date: Mon, 10 Jan 2011 11:26:05 -0700
CC: r-help at r-project.org
Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality

What is the question you are really trying to find the answer for?  Knowing that may help us give more meaningful answers.

You keep wanting to test the residuals for normality, but it looks like you are doing it because some outdate recipe suggests it rather than that you understand why.

It is fairly easy to create a distribution that is definitely not normal, that gives the wrong answer most of the time if normality is assumed, yet will pass most normality tests most of the time (well except for SnowsPenualtimateNormalityTest, but that one has an unfair advantage in this situation).  So just because the residuals look normal (or close enough) does not mean that the theory holds.

R. A. Fisher is said to have said that the quality of a statistician can be judged by the amount of rat droppings under his finger nails.  Now if we take that literally, then I must not be very good.  But more what he meant is that a statistician must understand the source of the data, not just get a file and put it through some canned routines.  So these questions are really for you or the source of your data.

Also remember that the normality of the data/residuals/etc. is not as important as the CLT for your sample size.  The main things that make the CLT not work (for samples that are not large enough) are outliers and strong skewness, since your outcome is limited to the numbers 1-7, I don't see outliers or skewness being a real problem.  So you are probably fine for fixed effects style models (though checking with experts in your area or doing simulations can support/counter this).  But when you add in random effects then there is a lot of uncertainty about if the normal theory still holds, the latest lme code uses mcmc sampling rather than depending on normal theory and is still being developed.

This now comes back to my first question: what are you trying to find out?

You may not need to do anova or that type of model.  Some simple hypotheses may be answered using McNemars test on your data.  If you want to do predictions then linear models will be meaningless (what would a prediction of -3.2, 4.493, or 8.1 mean on a 7 point likert scale?) and something like proportional odds logistic regression will be much more meaningful.  Between those are bootstrap and permutation methods that may answer you question without any normality assumptions.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

From: Frodo Jedi [mailto:frodo.jedi at yahoo.com]
Sent: Saturday, January 08, 2011 3:20 AM
To: Greg Snow
Cc: r-help at r-project.org
Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality

Dear Greg,
many thanks for your answer. Now I have a problem then in understanding how to check
normality in case of ANOVA with repeated measures.
I would need an help with a numeric example, as I haven?tu fully understood how it works with the
proj() command as it as suggested by another R user in this mailing list.

For example, in attachment you find a .csv table resulting from an experiment, you can access it by means of this command:
scrd<- read.csv(file='/Users/....../tables_for_R/table_quality_wood.csv',sep=',',header=T)
The data are from an experiment where participants had to evaluate on a seven point likert scale
the realism of some stimuli, which are presented both in condition "A" and in condition "AH".

I need to perform the ANOVA by means of this command:
aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd)
but the problem is that I cannot plot as usually do the qqnorm on the residuals of the fit because
lm does not support the Error term present in aov.
I normally check normality through a plot (or a shapiro.test function). Now could you please
illustrate me how will you be able to undestand from my data if they are normally distributed?

Please enlighten me

Best regards
A lot of this depends on what question you are really trying to answer.  For one way anova replacing y-values with their ranks essentially transforms the distribution to uniform (under the null) and the Central Limit Theorem kicks in for the uniform with samples larger than about 5, so the normal approximations are pretty good and the theory works, but what are you actually testing?  The most meaningful null that is being tested is that all data come from the exact same distribution.  So what does it mean when you reject that null?  It means that all the groups are not representing the same distribution, but is that because the means differ? Or the variances? Or the shapes? It can be any of those.  Some point out that if you make certain assumptions such as symmetry or shifts of the same distributions, then you can talk about differences in means or medians, but usually if I am using non-parametrics it is because I don't believe that things are symmetric and the shift idea doesn't fit in my mind.

Some alternatives include bootstrapping or permutation tests, or just transforming the data to get something closer to normal.

Now what does replacing by ranks do in 2-way anova where we want to test the difference in one factor without making assumptions about whether the other factor has an effect or not?  I'm not sure on this one.

I have seen regression on ranks, it basically tests for some level of relationship, but regression is usually used for some type of prediction and predicting from a rank-rank regression does not seem meaningful to me.

Fitting the regression model does not require normality, it is the tests on the coefficients and confidence and prediction intervals that assume normality (again the CLT helps for large samples (but not for prediction intervals)).  Bootstrapping is an option for regression without assuming normality, transformations can also help.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ben Ward
> Sent: Thursday, January 06, 2011 2:00 PM
> To: r-help at r-project.org
> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> normality
>
> On 06/01/2011 20:29, Greg Snow wrote:
> > Some would argue to always use the kruskal wallis test since we never
> know for sure if we have normality.  Personally I am not sure that I
> understand what exactly that test is really testing.  Plus in your case
> you are doing a two-way anova and kruskal.test does one-way, so it will
> not work for your case.  There are other non-parametric options.
> Just read this and had queries of my own and comments on this subject:
> Would one of these options be to rank the data before doing whatever
> model or test you want to do? As I understand it makes the place of the
> data the same, but pulls extreme cases closer to the rest. Not an
> expert
> though.
> I've been doing lm() for my work, and I don't know if that makes an
> assumption of normality (may data is not normal). And I'm unsure of any
> other assumptions as my texts don't really discuss them. Although I can
> comfortably evaluate a model say using residual vs fitted, and F values
> turned to P, resampling and confidence intervals, and looking at sums
> of
> squares terms add to explanation of the model. I've tried the plot()
> function to help graphically evaluate a model, and I want to make sure
> I
> understand what it's showing me. I think the first, is showing me the
> models fitted values vs the residuals, and ideally, I think the closer
> the points are to the red line the better. The next plot is a Q-Q plot,
> the closer the points to the line, the more normal the model
> coefficients (or perhaps the data). I'm not sure what the next two
> plots
> are, but it is titled Scale-Location. And it looks to have the square
> root of standardized residuals on y, and fitted model values on x.
> Might
> this be similar to the first plot? The final one is titled Residuals vs
> Leverage, which has standardized residuals on y and leverage on x, and
> something called Cooks Distance is plotted as well.
>
> Thanks,
> Ben. W
> > Whether to use anova and other normality based tests is really a
> matter of what assumptions you are willing to live with and what level
> of "close enough" you are comfortable with.  Consulting with a local
> consultant with experience in these areas is useful if you don't have
> enough experience to decide what you are comfortable with.
> >
> > For your description, I would try the proportional odds logistic
> regression, but again, you should probably consult with someone who has
> experience rather than trying that on your own until you have more
> training and experience.
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> > From: Frodo Jedi [mailto:frodo.jedi at yahoo.com]
> > Sent: Thursday, January 06, 2011 12:57 PM
> > To: Greg Snow; r-help at r-project.org
> > Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> normality
> >
> >
> > Ok,
> > I see ;-)
> >
> > Let?s put in this way then. When do I have to use the kruskal wallis
> test? I mean, when I am very sure that I have
> > to use it instead of ANOVA?
> >
> > Thanks
> >
> >
> > Best regards
> >
> > P.S.  In addition, which is the non parametric methods corresponding
> to a 2 ways anova?..or have I to
> > repeat many times the kruskal wallis test?
> > ________________________________
> > From: Greg Snow>
> > To: Frodo Jedi>; Robert Baer>;
> "r-help at r-project.org">
> > Sent: Thu, January 6, 2011 7:07:17 PM
> > Subject: RE: [R] Assumptions for ANOVA: the right way to check the
> normality
> >
> > Remember that an non-significant result (especially one that is still
> near alpha like yours) does not give evidence that the null is true.
> The reason that the 1st 2 tests below don't show significance is more
> due to lack of power than some of the residuals being normal.  The only
> test that I would trust for this is SnowsPenultimateNormalityTest
> (TeachingDemos package, the help page is more useful than the function
> itself).
> >
> > But I think that you are mixing up 2 different concepts (a very
> common misunderstanding).  What is important if we want to do normal
> theory inference is that the coefficients/effects/estimates are
> normally distributed.  Now since these coefficients can be shown to be
> linear combinations of the error terms, if the errors are iid normal
> then the coefficients are also normally distributed.  So many people
> want to show that the residuals come from a perfectly normal
> distribution.  But it is the theoretical errors, not the observed
> residuals that are important (the observed residuals are not iid).  You
> need to think about the source of your data to see if this is a
> reasonable assumption.  Now I cannot fathom any universe (theoretical
> or real) in which normally distributed errors added to means that they
> are independent of will result in a finite set of integers, so an
> assumption of exact normality is not reasonable (some may want to argue
> this, but convincing me will be very difficult).  But looking for exact
> normality is a bit of a red herring because, we also have the Central
> Limit Theorem that says that if the errors are not normal (but still
> iid) then the distribution of the coefficients will approach normality
> as the sample size increases.  This is what make statistics doable
> (because no real dataset entered into the computer is exactly normal).
> The more important question is are the residuals "normal enough"?  for
> which there is not a definitive test (experience and plots help).
> >
> > But this all depends on another assumption that I don't think that
> you have even considered.  Yes we can use normal theory even when the
> random part of the data is not normally distributed, but this still
> assumes that the data is at least interval data, i.e. that we firmly
> believe that the difference between a response of 1 and a response of 2
> is exactly the same as a difference between a 6 and a 7 and that the
> difference from 4 to 6 is exactly twice that of 1 vs. 2.  From your
> data and other descriptions, I don't think that that is a reasonable
> assumption.  If you are not willing to make that assumption (like me)
> then means and normal theory tests are meaningless and you should use
> other approaches.  One possibility is to use non-parametric methods
> (which I believe Frank has already suggested you use), another is to
> use proportional odds logistic regression.
> >
> >
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org>
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org
> project.org>  [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of Frodo Jedi
> >> Sent: Wednesday, January 05, 2011 3:22 PM
> >> To: Robert Baer; r-help at r-project.org>
> >> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> >> normality
> >>
> >> Dear Robert,
[[elided Hotmail spam]]
> >> So you also think that I have to check only the residuals and not
> the
> >> data
> >> directly.
> >> Now just for curiosity I did the the shapiro test on the residuals.
> The
> >> problem
> >> is that on fit3 I don?t get from the test
> >> that the data are normally distribuited. Why? Here the data:
> >>
> >>> shapiro.test(residuals(fit1))
> >>    Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit1)
> >> W = 0.9848, p-value = 0.05693
> >>
> >> #Here the test is ok: the test says that the data are distributed
> >> normally
> >> (p-value greather than 0.05)
> >>
> >>
> >>
> >>> shapiro.test(residuals(fit2))
> >>    Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit2)
> >> W = 0.9853, p-value = 0.06525
> >>
> >> #Here the test is ok: the test says that the data are distributed
> >> normally
> >> (p-value greather than 0.05)
> >>
> >>
> >>
> >>> shapiro.test(residuals(fit3))
> >>    Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit3)
> >> W = 0.9621, p-value = 0.0001206
> >>
> >>
> >>
> >> Now the test reveals p-value lower than 0.05: so the residuals for
> fit3
> >> are not
> >> distributed normally....
> >> Why I get this beheaviour? Indeed in the histogram and Q-Q plot for
> >> fit3
> >> residuals I get a normal distribution.
> >>
> >>
> >>
> >>
> >>
> >>
[ junk deleted, after hotmrail graciously allowed my ui to respond ]