Random versus fixed effects
I have made this sort of comment before, but I think it important enough to have another go, in a bit more detail. The extent of generalization ~~~~~~~~~~~~~~~~~~~~~~ Surely the key issue is: "To what population do you wish to generalize?" If one wants to generalize to other schools, then (as in the science data set in DAAG) one must have data that can be treated as a random sample of schools. For the science data, it turns out that the schools component of variance is so small that it can be treated as zero -- differences between classes seems, apart from individual variation, the only random effect needed. Moreover degrees of freedom = 39 for the schools component of variance is large enough that omission of this ~0 component makes little difference to the inference. Thus it may reasonably be omitted, simplifying the analysis. [Those who want to avoid talk of degrees of freedom might go directly to comparison of the two inferences, one with the schools component of variance, and the other without. Degrees of freedom are a rough, but often useful, information measure.] What if degrees of freedom for the schools component had been small, and omission of this component did affect the inference? Subject area knowledge and experience must then come into play - is a schools component of variance likely?, do other studies show evidence of it?, if so what magnitude?, and so on. Normality ~~~~~~~ This, while sometimes important, is a second order issue. The Central Limit Theorem comes to our aid if there is some modest number of degrees of freedom at the relevant level. One can always try transforming the data if it seems grossly non-normal, at the relevant level of variation. (Checking this is however non-trivial; plots of residuals typically mix in other not-all-that-relevant levels of variation.) vs fixed effect ~~~~~~~~~~ If the intention is to make statements only about the specific schools included in the study, then schools may be treated as fixed effects. In this case, for the science data set, there is no detectable difference between schools, and such a fixed effect can be omitted. Other reasons for use of fixed effects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As has been mentioned, there may be other reasons, other than the wish to generalize appropriately, for modeling an effect as random. If one is comparing 50 varieties of wheat, the estimates that are at the extremes will likely over-estimate the relevant effects. The BLUPs that are calculated from an analysis that treats the variety effects as random pull the estimates in towards the mean by amounts that, under often plausible model assumptions, are appropriate.] John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.
On 5 Jul 2008, at 12:16 AM, Rune Haubo wrote:
Hi Luis I largely agree with Mike's answer and have the following additional comments: The decision of whether a variable is taken as fixed or random often rests on subject specific matter. An important question is: Can the levels of the variable be considered as coming from a normal distribution? But other aspects also play a role, such as the number of realized levels of the variable (with only few levels, it will often be appropriate to treat the variable as fixed anyhow). The models rests on different distributional assumptions, so the decision is often based on weighing the appropriateness of these assumptions. To give more specific advise on the actual model comparison (ignoring the question of the appropriateness of the comparison), it matters whether you are thinking in terms of linear mixed models or generalized linear mixed models. In the former case assuming you have only one random effect and assuming lme is sufficient, you can do fm.lme <- lme(....) fm.lm <- lm(...) anova(fm.lme, fm.lm) If you are thinking in terms of generalized linear mixed models, and you are using lmer, then maybe you can use something like deviance(fm.lmer <- lmer(...)) deviance(fm.glm <- glm(...)) however, the reference distribution for the difference in deviance depends on the actual body of the function calls. Regards Rune 2008/7/4 Luis Orlindo Tedeschi <luis.tedeschi at gmail.com>:
Thanks Mike... and I thought it would have a single answer... I glanced over the link you provided; it will take me some time to digest it. My current problem is comparing a model with variable A as random effect vs a model with variable A as fixed effect. It gets vary confusing. Thanks again. Luis On Fri, 2008-07-04 at 09:28 +0100, Mike Dunbar wrote:
Dear Luis It is not necessarily straightforward but there is alot of information out there that can help you. Take a look at http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests and also look through the archives of this list, e.g. the thread entitled "[R-sig-ME] interpreting significance from lmer results for dummies (like me)" regards Mike
Luis Orlindo Tedeschi <luis.tedeschi at gmail.com> 03/07/2008 22:23 >>>
Folks; I have a quick question about model comparison. Is it ok to use BIC/AIC/-2log to compare models with different fixed and random effects and even different var-(co)var structure? How can I accomplish this using R? Will Anova do the correct comparison of different models? Thanks in advance. Luis -- Luis Orlindo Tedeschi <luis.tedeschi at gmail.com>
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
+----------------------------------------------------+
Luis O. Tedeschi, PhD, PAS
Assistant Professor
Texas A&M University
230 Kleberg Center p. (+1) 979-845-5065
2471 TAMU f. (+1) 979-845-5292
College Station, TX 77843-2471
http://nutritionmodels.tamu.edu
http://nutr.tamu.edu
http://people.tamu.edu/~luis.tedeschi
+----------------------------------------------------+
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Rune Haubo Bojesen Christensen Master Student, M.Sc. Eng. Phone: (+45) 30 26 45 54 Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com DTU Informatics, Section for Statistics Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models