Random versus fixed effects

I have made this sort of comment before, but I think it
important enough to have another go, in a bit more detail.

The extent of generalization
~~~~~~~~~~~~~~~~~~~~~~
Surely the key issue is: "To what population do you wish to
generalize?"  If one wants to generalize to other schools,
then (as in the science data set in DAAG) one must have
data that can be treated as a random sample of schools.

For the science data, it turns out that the schools component
of variance is so small that it can be treated as zero --
differences between classes seems, apart from individual
variation, the only random effect needed.  Moreover degrees
of freedom = 39 for the schools component of variance is
large enough that omission of this ~0 component makes little
difference to the inference.  Thus it may reasonably be omitted,
simplifying the analysis.
[Those who want to avoid talk of degrees of freedom might
go directly to comparison of the two inferences, one with the
schools component of variance, and the other without.
Degrees of freedom are a rough, but often useful, information
measure.]

What if degrees of freedom for the schools component had
been small, and omission of this component did affect the
inference?  Subject area knowledge and experience must
then come into play - is a schools component of variance
likely?, do other studies show evidence of it?, if so what
magnitude?, and so on.

Normality
~~~~~~~
This, while sometimes important, is a second order issue.
The Central Limit Theorem comes to our aid if there is
some modest number of degrees of freedom at the relevant
level.  One can always try transforming the data if it seems
grossly non-normal, at the relevant level of variation.
(Checking this is however non-trivial;  plots of residuals
typically mix in other not-all-that-relevant levels of variation.)

vs fixed effect
~~~~~~~~~~
If the intention is to make statements only about the specific
schools included in the study, then schools may be treated
as fixed effects.  In this case, for the science data set, there
is no detectable difference between schools, and such a
fixed effect can be omitted.

Other reasons for use of fixed effects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As has been mentioned, there may be other reasons, other
than the wish to generalize appropriately, for modeling an
effect as random.  If one is comparing 50 varieties of wheat,
the estimates that are at the extremes will likely over-estimate
the relevant effects.  The BLUPs that are calculated from an
analysis that treats the variety effects as random pull the
estimates in towards the mean by amounts that, under often
plausible model assumptions, are appropriate.]

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

Random versus fixed effects

Thread (3 messages)