Skip to content
Prev 14986 / 20628 Next

Help with Linear Model

(Try to remember to keep the list in CC -- I'm really bad at it, too!)

Looking at your QQ plot (consider posting it somewhere online for
posteriority as the list strips attachments), it seems that the
deviation from normality occurs in the form of heavy tails of the sort
you would find with a t-distribution. If you were doing Bayesian
modelling, I would suggest that you just use a t-likelihood instead of
a normal one .... but that may not even be necessary here.

(For one thing, it's not the data that have to be normally distributed,
but rather the residuals or equivalently, the reponse *conditioned* on
the predictors. The linear model is after all given by Y = B*X + B0 +
e, with e ~ N(0,sigma). And in the mixed-model case, you have
additional normal distributions mixing in there via the random
effects.)

I would check your model fit by checking e.g.?

?- fitted vs. actual (you can fake this by using faceting in ggplot or
the | operator in lattice, or you can do it for real using the
predict() function)?
?- fitted vs. residuals ( plot(lme.model) does this for you )

The former will tell you whether your model accurately represent your
data (and places where it fails to do so), while the latter will give
you a visual impression about how bad the violation of normality of the
residuals is.

More important than fulfilling distributional assumptions for me is
seeing how well the model actually fits the data and how good it is at
predicting new data. (Techniques like cross-validation or posterior
predictive check in the Bayesian framework are based on this idea as
well.) Violating the testing assumptions does mean that not all the
?frequentist guarantees hold, but if your model does a good job at
describing old and predicitng new data in practice, then that may be
enough for many applications.?

Now, if you're focussed on significance tests or traditional confidence
intervals, violations of model assumptions can ruin your day, as you
can no longer trust that the null distribution is correct. However, you
can still use bootstrapped confidence intervals, altough those take
much longer to compute.

One final thing: I see you're using Type-III tests. Even though it's
the default in certain popular commercial statistical packages, I would
encourage you to think about twice about doing so. Read Venables'
Exegeses on Linear Models and see some of the stuff John Fox (author of
the car packages) has written on this topic (e.g.?https://stat.ethz.ch/
pipermail/r-help/2006-August/111927.html).?A lot of ink has been
spilled on this topic and it's not always black and white, but I
generally find that Type-II tests are actually the thing that I want.

Best,
Phillip
On Fri, 2016-11-04 at 06:16 -0600, Joseph Aidoo wrote: