Ok with a "small amount" of non-normality?
The question that you ask does not admit of an easy answer! Hopefully the following will shed light on the general tack to be taken. One can do simulations and check the effect of one or other level of skewness (usually skewness to the right) on the parameter estimates. The residuals (whether level 0 or level 1 in your case) are rarely the right quantities to check for normality. The way they are offered as a source of insight in the Pinheiro and Bates book seems to me misleading in this respect. Consider a split plot design, with treatments estimated at the level of plots within blocks, as in the kiwishade dataset in the DAAG package. What matters for comparing treatments is the (approximate) normality of what in the Genstat world would be called effects at the plot level. There are just 12 of these. They can for this balanced design be obtained by basing the analysis on the plot means; they are the residuals from that analysis. Any skewness at the subplot level gets somewhat averaged out. (There are 4 subplot values per plot.) The residuals from the lme model, whether at the subplot or plot level, will exaggerate any skewness that may be due to variation between subplots. Even in this simple case, these 'effect' estimates are correlated, which somewhat complicates the checking for normality. Direct checks for the distribution of the relevant quantities get quite messy for unbalanced designs. Bootstrap methods, having regard to the covariance structure, might be considered. Or make a stab at the distributions of the relevant component effects (now as in an lme or lmer sense), and simulate. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 04/05/2013, at 4:36 AM, "Boulanger, Yan" <Yan.Boulanger at rncan-nrcan.gc.ca> wrote:
Hi folks, This may be more of a "philosophical"- student question. In Zuur et al. (2009). "Mixed effects models and extensions in ecology with R", it is mentioned on page 20 that "[...] we can get away with a small amount of non-normality" I'm little bit puzzled when I face this kind of affirmation in a textbook. What is really "a small amount"? Of course, it depends on your "judgement"... In my case, I have level0 and level1 residuals that are unskewed and that show a relatively modest kurtosis (unbiased) of about 2.5 - 3.0. My models are based on several tens of thousands of individuals and normality tests (e.g., shapiro.test) always fail for residuals. QQ-plot show these rather long tails which correspond to "some" outliers (considering my data, there are several hundreds of "outliers" in this case). Homoscedaticity, when considering or not random effects, is not violated so I wondered if I could rely on these model's estimates considering the non-normality of the residuals. My judgement in this case would be that the departure from normality is not that high and this might not be a problem. But, as an ecologist, not a statistician, I have hard time to convince myself on this... Any thoughts? Thanks Yan [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models